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ABSTRACT 


Currently,  the  Department  of  Defense  runs  its  special  purpose  applications  on 
dedicated  hardware  (i.e.,  on  “stovepipe  systems”).  Such  hardware  has  inherent 
disadvantages.  They  have  an  inability  to  handle  the  resource  contention  that  often  occurs 
upon  the  influx  of  a  large  number  of  applications.  A  new  application  needing  to  use  a 
given  resource  must  typically  wait  for  any  preceding  applications  to  first  finish  their  use 
instead  of  searching  out  another  capable  resource.  An  even  worse  scenario  is  when  the 
system  fails  and  no  applications  can  run  imtil  the  system  is  repaired  and  brou^t  back  on¬ 
line.  In  all  the  cases,  important  decisions  can  potentially  be  delayed  or  made  without 
important  information.  The  Management  System  for  Heterogeneous  Networks  (MSHN) 
will  mitigate  these  deficiencies. 

The  goal  of  MSHN  is  to  manage  several  different  types  of  applications  across  a 
changing  heterogeneous  network.  MSHN  determines  the  best  resource  on  which  to  run  an 
application  based  on  both  the  applications  and  overall  system’s  Quality  of  Service  (QoS). 
The  focus  of  this  thesis  is  to  write  and  demonstrate  for  MSHN  the  worth  of  an  algorithm 
that  can  determine  and  update  distribution  statistics  for  the  end-to-end  QoS  resource  usage 
of  an  application  program.  These  distributions  are  vital  in  assisting  MSHN  in  the 
scheduling  and  rescheduling  of  applications  across  a  network. 
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I. 


INTRODUCTION 


This  thesis  investigates  some  of  the  issues  and  problems  associated  with  acquiring 
distribution  statistics  for  an  application’s  resource  usage  while  running  in  a  heterogeneous 
distributed  network.  These  distribution  statistics  are  an  aggregated  and  more  usable  form 
of  the  raw  data  provided  by  the  Client  Library  in  the  Management  System  for 
Heterogeneous  Networks  (MSHN).  The  resulting  distributions  from  this  study  can  be  used 
by  MSHN  to  determine,  for  all  machines  on  the  network  and  with  some  probability,  an 
application’s  runtime,  required  memory,  and  data  stream  statistics,  among  many  others. 
MSHN  can  also  use  this  information  to  help  in  the  scheduling  and  rescheduling  of 
applications  across  the  network. 

A.  BACKGROUND 

In  the  early  1990’s,  the  Heterogeneous  Computing  Team  at  the  US  Navy’s 
NCCOSC  RDTE  Division  in  San  Diego  developed  a  scheduling  framework  called 
SmartNet  for  managing  jobs  and  resources  in  a  heterogeneous  computing  enviromnent. 
The  designers  of  SmartNet  incorporated  six  innovative  techniques  to  improve  its 
performance  over  other  types  of  Resource  Management  Systems  (RMSs).  The  six 
innovations  were  (1)  how  SmartNet  recognized  and  exploited  heterogeneity,  (2)  its 
development  of  Compute  Characteristics,  (3)  its  ability  to  handle  uncertainty,  (4)  how  it 
accounted  for  the  sharing  of  resources  in  a  distributed  environment,  (5)  its  view  of 
optimization  criteria(s),  and  (6)  the  methods  employed  by  it  to  search  the  scheduling  space 
[KIDD96].  In  addition,  SmartNet  has  the  ability  to  make  the  scheduler  aware  of  two  very 
important  elements  in  a  heterogeneous  network  that  previous  distributed  environments  did 
not:  (1)  the  load  on  the  machines  in  the  network,  and  (2)  an  application’s  affinity  for  a 
particular  machine  on  the  network.  During  its  heyday,  SmartNet  ran  in  several  computing 
centers  across  the  United  States.  For  a  full  description  of  SmartNet,  see  [KIDD96]. 
SmartNet  has  spawned  other  projects  in  the  research  area  of  distributed  computing  in 
heterogeneous  environments.  One  such  project  is  the  Management  System  for 
Heterogeneous  Networks  (MSHN). 
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MSHN  is  a  DARPA  sponsored  project  funded  to  investigate  and  demonstrate,  in  a 
Department  of  Defense  (DoD)  context,  an  environment  that  supports  and  illustrates  the 
usefulness  of  adaptive  applications.  Currently,  the  DoD  runs  its  special  purpose 
applications  on  dedicated  hardware  (i.e.,  on  “stovepipe”  systems).  An  inherent 
disadvantage  in  using  a  stovepipe  system  is  its  inability  to  handle  the  resource  contention 
that  often  occurs  upon  the  influx  of  a  large  nximber  of  tasks.  In  such  systems,  a  new 
application  needing  to  use  a  given  resource  must  typically  wait  for  any  preceding 
applications  to  first  finish  their  use  instead  of  searching  out  another  capable  resource.  An 
even  worse  scenario  occurs  when  the  system  fails  and  no  applications  can  run  until  the 
system  is  repaired  and  brought  back  on  line.  In  either  case,  important  decisions  can 
potentially  be  delayed  or  made  without  important  information.  MSHN  will  mitigate  fiiese 
deficiencies.  The  goal  of  MSHN  is  to  manage  several  different  types  of  tasks  across  a 
changing  heterogeneous  network.  MSHN  determines  the  best  resource  on  which  to  run  an 
application  based  on  both  the  application  and  overall  system’s  Quality  of  Service  (QoS). 
Some  factors  influencing  QoS  include  security,  deadlines,  priorities,  adaptability  and 
resource  availability.  Much  of  this  information  is  stored  in  two  databases/status  servers  in 
the  MSHN  architecture  (see  Figure  1  below). 
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Figure  1:  MSHN  Architecture  with  Permission  From  [BnENS99] 

To  date  among  many  other  things,  MSHN  has  demonstrated  the  ability  to  collect 
elemental  QoS  data  (e.g.,  CPU  time,  total  memory  used,  and  network  latency)  from 
adaptive  applications  [SCHN99,  PORT99].  The  focus  of  this  thesis  is  to  develop  and 
demonstrate  the  worth  of  algorithms  that  determine  and  update  the  distribution  statistics  of 
this  elemental  data.  For  example,  the  resulting  distributions  from  this  study  can  be  used  by 
MSHN  to  better  determine,  for  all  machines  on  the  network  and  -with  some  probability,  an 
application’s  system  and  user  runtimes,  and  required  memory.  MSHN  can  also  use  this 
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information  to  help  determine  the  scheduling  and  rescheduling  of  applications  across  the 
network. 

B.  SCOPE  OF  THIS  THESIS 

The  MSHN  architecture  consists  of  ten  major  components:  (1)  the  Scheduling 
Advisor,  (2)  the  Client  Library,  (3)  the  Resource  Status  Server,  (4)  the  Resource 
Requirements  Database,  (5)  the  MSHN  Daemon,  (6)  an  application,  (7)  the  Application 
Emulator,  (8)  an  adaptation-aware  qjplication,  (9)  resources,  and  (10)  the  Visualizer.  The 
functionality  of  these  components  are  discussed  in  Chapter  II.  This  diesis  focuses 
primarily  on  the  Client  Library  (CL),  Resource  Requirements  Database  (RRD),  and 
Resource  Status  Server  (RSS).  The  CL  serves  many  purposes.  It  links  with  legacy, 
adaptive,  and  MSHN-aware  applications  providing  (1)  a  transparent  interface  to  all  of  the 
other  components,  and  (2)  a  mechanism  for  intercepting  system  calls  for  the  collection  of 
resource  usage  and  status  information.  The  collected  information  is  then  forwarded  to  and 
stored  in  the  RSS  and  the  RRD  for  use  by  the  Scheduling  Advisor  (SA)  and  the  Visualizer 
(VIS). 

The  first  part  of  the  objective  of  this  thesis  is  to  design  an  algorithm  that  calculates 
the  distribution  statistics  for  both  the  resource  usage  of  applications  linked  with  the  CL, 
and  the  resource  status  of  the  various  resources  on  the  network  (e.g.,  computers  and  the 
network  itself).  The  distribution  statistics  can  then  be  stored  in  the  RRD  and  the  RSS  for 
use  by  the  SA  and  VIS. 

Specifically,  this  thesis  answers  the  following  questions: 

•  Can  specialized  algorithms,  ones  that  rely  on  knowing  die  underlying 
distribution  of  the  data,  be  useful  in  MSHN  or  is  a  more  generalized  stochastic 
algorithm  needed? 

•  What  parameters  or  set  of  parameters  should  be  stored  in  the  RSS  and  RRD? 
How  much  of  the  data  produced  from  the  client  wrapper  should  be  maintained 
for  use  in  determining  the  distribution  statistics? 
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C.  MAJOR  CONTRIBUTIONS  OF  THIS  THESIS 


This  thesis  makes  contributions  to  MSHN  and  the  DoD.  First,  this  thesis  provides 
procedures  for  aggregating  and  analyzing  data  collected  by  MSHN’s  Client  Library.  These 
procedures  aggregate  raw  data  in  to  a  statistical  form  more  usable  for  decision  making  by 
the  MSHN  components.  Secondly,  this  thesis  demonstrates  that  these  statistics  are  of 
interest. 

This  research  benefits  the  DoD  in  supporting  application  mixes  which  will  be 
sharing  resources  in  the  future.  As  the  DoD  moves  away  firom  “stovepipe”  systems  and 
towards  commercial-off-the-shelf  (COTS)  systems,  we  must  ensure  that  those  systems  (1) 
can  share  resources  (dedicated  hardware  will  no  longer  be  practical  for  most  real-time 
applications),  and  (2)  in  a  crisis  situation,  can  best  manage  the  use  of  and  maximize  the 
benefits  of  those  resources.  This  study  aids  in  understanding  how  to  best  move  towards 
such  COTS  systems. 

The  procedures  and  research  in  this  thesis  focused  on  aggregating  and  analyzing 
data  for  MSHN.  However,  it  should  be  clear  that  this  research  and  these  procedures,  with 
slight  modifications,  could  be  applied  to  numerous  data  aggregation  and  analysis  problems. 

D.  ORGANIZATION 

Each  chapter  of  this  thesis  begins  with  an  introduction  and  ends  with  a  brief 
sununary.  The  body  of  this  thesis  is  organized  as  follows:  Chapter  II  discusses  the  goal  of 
MSHN,  describes  the  functionality  of  the  MSHN  components,  and  explains  where  and 
how  this  research  fits  into  the  MSED^  program.  Chapter  III  examines  resource  monitoring 
systems,  and  the  potential  of  future  scheduling  algorithms  that  can  make  use  of  statistical 
information.  Chapter  IV  states  the  definition  of  the  problem,  enumerates  and  explains 
possible  approaches,  and  discusses  possible  solutions.  Chapter  V  describes  how  an 
application  is  wrapped  using  the  MSHN  Client  Library.  Chapter  VI  explains  the  design  of 
the  experiment,  and  discusses  and  provides  an  analysis  of  the  results.  The  final  chapter 
provides  a  summary  of  the  work  done  in  this  thesis  and  presents  possibilities  for  future 
work. 
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II.  MSHN 


The  primary  purpose  of  this  chapter  is  to  illustrate  the  importance  of  being  able  to 
calculate  distribution  statistics  for  MSHN.  For  this  illustration  to  be  clear,  the  reader  must 
have  some  level  of  understanding  of  how  MSHN  evolved,  what  MSHN’s  goals  are,  and 
how  the  components  of  the  MSHN  architecture  function.  A  tremendous  amount  of 
research  has  been  conducted  in  the  area  of  resource  management  and  none  more  important 
than  that  conducted  by  the  Heterogeneous  Computing  Team  at  the  US  Navy’s  facility  at 
the  NCCOSC  RDTE  Division  in  San  Diego.  This  initial  research  eventually  led  to  MSHN. 

A.  THE  EVOLUTION  OF  MSHN 

A  heterogeneous  computing  environment  executes  many  different  types  of  I/O¬ 
intensive  and/or  compute-intensive  applications  on  many  different  types  of  computers.  In 
such  an  environment,  the  assignment  of  jobs  to  resources  is  generally  done  using  one  of 
two  basic  systems,  a  Resource  Management  System  (RMS)  or  a  Distributed  Operating 
Systems  (DistOS).  The  main  goal  of  these  types  of  systems  is  to  transparently  give 
simultaneous  users  direct  access  to  many  different  types  of  powerful  computers  upon 
which  to  run  their  applications.  When  users  schedule  jobs  on  local  hosts,  two  important 
elements  are  taken  into  consideration:  (1)  how  busy  the  machine  is  (the  Load),  and  (2)  how 
well  a  job  runs  on  that  particular  machine  (the  Affinity).  For  example,  if  other  capable 
machines  are  available,  scheduling  a  job  on  a  machine  that  is  currently  running  several 
other  applications  may  not  make  sense.  On  the  other  hand,  scheduling  a  job  on  a  machine 
just  because  no  other  applications  are  running  on  it  may  not  make  sense  either,  especially  if 
the  job  does  not  execute  well  on  that  particular  machine.  Generally  speaking,  the  two 
systems  mentioned  above  only  consider  machine  load  in  their  scheduling  policies.  A  RMS 
client  runs  on  a  host  while  accepting  jobs,  and  then  schedules  those  jobs  on  machines  that 
have  the  lightest  load.  In  contrast,  a  distributed  operating  system  controls  and  manages  its 
hardware  and  software  resources  in  a  manner  such  that  its  users  view  the  entire  system  as  a 
powerful  monolithic  computer  system. 

The  example  presented  in  [KIDD96]  compares  the  times  at  which  the  last  job 
completes  of  three  schedulers  using  three  different  scheduling  policies.  The  scenario  has 
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three  machines,  A,  B  and  C,  and  four  jobs,  1,  2,  3  and  4.  Each  scheduler  computes  a 
schedule  based  on  its  scheduling  policy.  The  three  scheduling  policies  used  consider  load, 
affinity,  and  a  combination  of  load  and  affinity.  Scheduler  1  used  Opportunistic  Load 
Balancing  (OLB)  as  its  scheduling  policy.  OLB  is  similar  to  the  policies  used  by  most 
RMSs  and  DistOSs  in  that  it  assigns  the  next  queued  job  to  the  next  available  machine. 
Scheduler  2  used  a  policy  called  Limited  Best  Assignment  (LBA).  LBA  assigns  each  job 
to  the  machine  where  that  job  is  expected  to  run  the  fastest,  assuming  that  all  machines  are 
empty.  Scheduler  3  assigns  jobs  according  to  both  their  affinity  for  a  machine  and  the  load 
on  that  machine.  To  keep  the  experiment  simple,  the  authors  of  [KIDD96]  made  the 
following  three  assumptions:  (1)  every  job  executes  for  exactly  the  predicted  amount  of 
time,  (2)  jobs  all  arrive  simultaneously,  and  (3)  jobs  were  queued  in  the  specific  order  as 
seen  in  Table  1  below. 


JOBS 

MACfflNES 

A 

B 

C 

1 

4 

17 

7 

2 

5 

11 

6 

3 

4 

16 

8 

4 

11 

4 

9 

Table  1:  Job  Execution  Lengths  with  Permission  From  [KIDD96] 
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Using  the  job  execution  lengths  given  in  Table  1,  the  three  schedulers  produce  the 
following  schedules  shown  below  in  Figure  2. 


Machine  A  I  Job  l  r4'i| 
Machine  B  I  Job  2 
Machine  C  I  Job  3 


Machine  A  l  Job  i  r4~>i 
Machine  B  I  job  4  r4ti 
Machine  C 


m>-4_  ri5^ 

(ii3 

HZHM 


^diednl^^ggortmiisti^yoa^^lan^ig 


Job  2  rp-jlJobS 


Machine  A  I  Job  i  r4^i  Job  3 

Machine  B  i  job  4  r4'il 
Machine  C  i  job  2  ~m 

-  Sdiednle  3  SmartNet 


Figure  2:  Schedule  Comparison  with  Permission  From  [KIDD99] 

From  Figiire  2,  it  is  clear  that  the  schedulers  whose  policies  only  considered  one  of  the 
elements,  either  load  or  affinity,  performed  significantly  slower  than  the  SmartNet 
scheduler,  which  considered  both  elements.  In  fact,  the  SmartNet  scheduler  completed  the 
jobs  in  roughly  half  the  time  of  the  other  two  schedulers.  With  few  exceptions,  policies 
used  in  most  RMSs  and  DistOSs  only  consider  the  load  on  a  particular  machine  when 
scheduling  jobs,  much  like  Scheduler  1.  From  Figure  2,  we  can  also  see  that  in  a 
heterogeneous  computing  environment  better  schedules  are  possible  when  a  scheduler 
considers  both  how  busy  machines  are  and  how  well  applications  run  on  each  of  the 
machines.  A  scheduling  policy  considering  both  load  and  affinity  is  used  in  SmartNet. 
This  policy  was  one  of  several  innovations  that  improved  SmartNet’ s  performance  over 
that  of  other  existing  RMSs.  The  other  SmartNet  performance  innovations  included  its 
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ability  to  effectively  deal  with  heterogeneity,  the  development  of  compute  characteristics, 
its  ability  to  deal  with  uncertainty,  how  it  accounted  for  resource  sharing  in  a  distributed 
environment,  its  view  of  optimization  criteria(s),  and  its  methods  for  searching  the 
schedule  space  [KIDD96]. 

SmartNet  was  designed  to  take  advantage  of  the  computer  heterogeneity  inherent  in 
a  distributed  networked  environment.  As  discussed  above,  certain  jobs  have  an  affinity  for 
certain  machines.  One  of  SmartNet’s  objectives  was  to  find  the  machine  upon  which  a  job 
would  execute  the  fastest.  Figure  3,  taken  from  [KIDD96],  shows  how  performance  is 
maximized  when  the  RMS  can  match  a  job’s  type  to  the  machine  upon  which  the  job 
executes  best. 


Program  Type 


Figure  3:  Computer  Heterogeneity  with  Permission  From  [KIDD96] 
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SmartNet’s  ability  to  produce  quality  schedules  relied  heavily  upon  its  being  able 
to  predict  good  estimates  of  job  runtimes.  If  SmartNet  knew  the  exact  runtimes  of  jobs  a 
priori,  scheduling  would  be  much  less  difficult  to  perform.  Unfortunately  this  was  not  the 
case.  The  next  best  thing  to  having  exact  job  runtimes  is  having  the  runtime  distributions 
of  those  jobs,  provided  the  distribution  is  uni-modal  and  has  a  narrow  variance.  Again, 
this  is  typically  not  the  case.  In  fact,  job  runtime  distributions  are  often  just  the  opposite  in 
that  they  have  a  wide  variance  and  are  generally  multi-modal.  To  deal  with  these  multi¬ 
modal  distributions,  the  SmartNet  Team  partitioned  the  distribution  into  pieces.  Compute 
Characteristics  and  Compute  Characteristic  Operating  Points  (CCOPs)  define  the 
partitions.  Figure  4  below  shows  a  typical  runtime  distribution  and  how  it  might  look 
before  and  after  being  partitioned.  This  instance  shows  a  job  with  a  single  Compute 
Characteristic  and  three  different  CCOPs. 


Runtime  Distribution 


Partitioned  Runtime  Distribution 

1=2  1=3 


Figure  4:  Partitioned  Runtime  with  Permission  From  [KIDD96] 
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Distributed  environments  are  non-deterministic  in  nature  for  reasons  such  as 
resources  being  shared  and  machines  operating  asynchronously.  In  order  to  improve  on 
performance,  SmartNet  tracked  and  took  into  account  the  uncertainty  caused  by  non¬ 
determinism. 

The  final  iimovation  of  SmartNet  was  that  it  included  in  its  Scheduler  separate 
optimization  and  search  engines.  The  criterion  in  the  optimization  engine  defined  the  best 
possible  solution  in  the  space  the  search  engine  explored. 

All  of  the  above  iimovations  were  vital  to  SmartNet’s  success.  SmartNet  has  made 
significant  contributions  to  several  government  agencies  such  as  the  DoD,  the  National 
Institute  of  Health,  and  NASA.  Perhaps  even  more  important  was  the  research  conducted 
to  build  SmartNet.  Today,  much  of  the  research  on  computing  in  a  distributed  and 
heterogeneous  environment  is  a  continuation  of  or  is  incidental  to  SmartNet.  MSHN  is  one 
of  many  projects  that  have  benefited  firom  the  work  done  on  the  SmartNet  project.  For  a 
much  more  detailed  description  of  SmartNet’s  architecture  and  how  SmartNet  operated, 
see  [KIDD96]. 

B.  MSHN’S  GOALS 

An  important  difference  between  MSHN  and  SmartNet  is  that  SmartNet  was 
designed  to  be  a  fully  functional  and  implemented  product,  while  MSHN  is  a  research 
system.  MSHN  is  built  and  tuned  through  experimentation  with  the  desire  of  determining 
the  best  ways  to  build  a  RMS,  and  has  much  broader  goals  than  SmartNet.  MSHN  has 
three  overarching  goals.  First,  MSHN  has  to  account  for  the  overhead  of  jobs  sharing 
resources.  This  has  a  significant  impact  on  mapping  and  scheduling.  Second,  MSHN 
needs  to  support  adaptive  and  adaptive-aware  applications.  Adaptive  applications,  as 
defined  by  the  MSHN  researchers,  are  idempotent  applications  that  can  exist  in  several 
different  versions  [HENS99].  Third,  MSHN  needs  to  provide  good  Quality  of  Service 
(QoS)  to  several  different  sets  of  simultaneous  users,  each  of  whom  may  be  executing 
different  t3^es  of  jobs. 

The  application  model  used  by  MSHN  is  much  more  complex  than  that  used  in 
SmartNet.  In  SmartNet,  applications  first  acquire  data  fi’om  a  data  repository,  then 
compute  results  based  on  the  data  gathered,  and  finally  write  the  results  back  to  a  possibly 
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different  repository.  Because  acquiring  the  data  and  writing  the  data  back  to  the  repository 
are  of  significantly  shorter  duration  than  computing  the  results,  SmartNet  assumed  there 
was  no  contention  for  either  the  network  or  the  data  repositories.  MSHN’s  applications  go 
through  many  more  phases,  each  of  variable  length,  and  therefore  MSHN  must  account  for 
the  resulting  overhead.  These  phases  are  discussed  in  Section  C  of  this  chapter. 

MSHN  has  the  ability  to  perform  a  cost  benefit  analysis,  terminating  if  necessary 
the  current  version  of  an  application  in  favor  of  a  version  that  will  better  meet  the  user’s 
QoS  expectations.  For  example,  suppose  a  user  has  two  applications,  one  that  can  generate 
and  display  a  full  video  of  the  latest  weather  patterns,  and  another  that  instead  produces  a 
succession  of  still  photos  of  the  patterns.  The  user  prefers  the  video  to  the  photos  but 
needs  to  see  one  or  the  other  immediately.  At  the  time  the  user  submits  this  application, 
MSHN  determines  that  there  are  insufficient  resources  to  run  the  video  (e.g.,  too  little 
bandwidth)  and  the  still  photos  are  shown  instead.  If,  after  a  short  period  of  time,  enough 
bandwidth  becomes  available,  MSHN  may  decide  to  terminate  the  photo  application  and 
begin  executing  of  the  video  application  once  again.  Though  an  over  simplification  of  the 
process,  this  scenario  is  an  excellent  example  of  an  application  adapting  to  a  changing 
environment.  It  is  important  to  note  that  an  adaptive  application  does  not  necessarily  adapt 
without  some  sort  of  user  interaction,  either  at  the  time  the  application  is  submitted  or 
during  its  execution. 

Another  application  that  MSHN  can  manage  similar  to  the  adaptive  application  is 
called  an  adaptive-aware  application.  An  adaptive-aware  application  has  the  ability  to 
sense  changes  in  its  environment.  Therefore,  an  adaptive-aware  application  can  sense  the 
decline  of  a  resource  (e.g.,  bandwidth)  and  adapt.  In  the  above  scenario,  perhaps  the 
adaptive-aware  application  will  show  the  still  photos.  In  addition,  the  adaptive-aware 
application  potentially  has  the  ability  to  migrate,  i.e.,  store  its  state  when  it  terminates  and 
restore  its  state  upon  execution.  For  example,  if  an  adaptive-aware  application  detects  a 
significant  loss  of  bandwidth,  it  can  take  the  following  actions:  stop  its  execution  in  place, 
store  its  current  state,  and  use  the  saved  state  to  immediately  start  presenting  the  still 
photos  at  approximately  the  same  spot  at  which  the  video  was  terminated.  [HENS99] 

Finally,  MSHN  researchers  are  defining  a  measure  that  encapsulates  many  different 
QoS  requirements.  This  measure  is  optimized  and  then  used  by  MSHN  to  map 
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applications  to  resources.  Much  of  how  the  above  goals  are  going  to  be  accomplished  is 
presented  in  the  following  section  on  MSHN  components  and  functionality. 

C.  MSHN  COMPONENTS  AND  FUNCTIONALITY 

This  section  is  dedicated  to  describing  how  the  separate  components  of  MSHN 
communicate  and  function.  It  presents  a  general  overview  of  the  system  as  a  whole 
followed  by  the  functionality  of  the  individual  components. 

As  seen  in  Chapter  I,  Figure  1,  MSHN  has  six  main  components.  These 
components  can  all  exist  on  die  same  machine  or  can  be  distributed  throughout  the 
network  on  different  machines.  The  Daemon  is  the  only  component  that  must  be  present 
on  all  machines  in  the  MSHN  environment.  A  possible  MSHN  set-up  is  presented  in 
Figure  5  below. 


Middleware 

Cl  applicatiOTTl^ 

Cizemo^  Cr^mulaton> 

Cl  applicati^) 

Operating  System  1 

Machine  1 
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RRD 

Operating  System  3  1 

Machine  3 

Operating  SYStem4 


Machine  4 


Figure  5:  Physical  Instantiation  of  MSHN  Architecture  with  Permission  From 

IHENS99] 


When  an  adaptive  or  adaptive-aware  application  is  submitted  to  MSHN  for 
execution,  it  is  “wrapped”  in  MSHN’s  Client  Library  (CL).  The  CL  is  the  one  MSHN 
component  that  communicates  with  all  of  the  other  MSHN  components.  The  CL  intercepts 
an  application’s  calls  to  system  libraries.  Before  executing  a  new  process,  the  CL 
references  a  list  of  applications  managed  by  MSHN.  If  the  apphcation  is  not  on  the  hst, 
MSHN  passes  the  request  to  the  local  OS.  If  the  application  is  on  the  list,  the  request  is 
passed  to  the  MSHN  Scheduling  Advisor  (SA).  The  SA  determines  where  to  execute  the 
application  or  process.  This  decision  is  based  mostly  on  information  queried  from  the 
MSHN  Resource  Requirements  Database  (RRD)  and  the  MSHN  Resource  Status  Server 
(RSS).  The  RRD  is  a  database  that  stores  information  pertaining  to  and  gathered  on  which 
resources  and  how  much  of  those  resources  a  particular  application  requires.  The  RSS  is 
similar  to  the  RRD  in  that  it  too  is  a  database  storing  resource-related  information.  The 
RSS  stores  current  status  and  availability  information  on  all  of  the  resources  available  to 
MSHN.  Both  the  RRD  and  the  RSS  store  data  gathered  via  the  CL.  The  CL  passively 
monitors  and  reports  an  application’s  resource  usage  to  the  RRD  and  the  current  status  of 
resources  used  by  the  application  to  the  RSS.  Based  on  the  advice  of  the  SA,  the  CL 
contacts  the  MSHN  Daemon  on  the  appropriate  resource  and  requests  that  the  process  be 
started  on  that  particular  machine.  That  Daemon  then  starts  executing  the  process. 

The  final  component  is  the  MSHN  Application  Emulator  (AE).  The  AE’s  purpose 
is  twofold.  First,  it  mimics  a  real  application  without  the  overhead  of  re-coding,  installing, 
maintaining,  and  running  the  real  application.  Its  execution  can  help  in  initially  populating 
the  RRD.  Second,  the  AE  can  be  used  to  sense  the  status  of  resources  upon  which  no 
MSHN  applications  are  running,  ensuring  that  the  RSS  is  being  populated  with  accurate 
data.  What  follows  is  a  more  detailed  description  of  the  MSHN  components  important  to 
this  thesis  research  and  the  purpose  those  components  serve.  The  components  are  the  CL, 
the  RRD,  and  the  RSS. 

The  Client  Library,  commonly  referred  to  as  “the  wrapper,”  was  designed  and 
tested  by  Mathew  Schnaidt  [SCHN98].  It  was  later  modified  and  implemented  by  the 
MSHN  staff.  As  discussed  above,  the  CL  is  linked  to  the  application  in  order  to  intercept 
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system  calls  and  rqjort  resource  usage  and  resource  status  to  the  RRD  and  RSS, 
respectively.  The  following  is  a  list  of  the  resources  that  the  prototype  wrapper  monitored: 

•  Total  runtime 

•  Time  blocked  waiting  for  user  input 

•  Local  I/O 

-  Total  number  of  bytes  read/written 

-  Total  number  of  reads/writes 

•  Network  file  I/O 

-  Time  to  read  from  remote  disk 

-  Total  number  of  bytes  read/written 

-  Total  number  of  reads/writes 

•  Network  I/O 

-  Total  number  of  bytes  read/written 

-  Estimate  of  latency  seen  by  process 

-  Estimate  of  throughput  seen  by  process 

•  Local  EPC 

-  Total  number  of  bytes  read/written 

-  Total  number  of  reads/writes 

In  addition,  the  following  resource  information  is  available  through  system  calls  and 
utilities: 

•  Total  memory  used 

•  Number  of  page  faults 

•  CPU  time  and  user  time 

The  above  information  is  what  was  included  in  the  CL  in  the  initial  prototype  version. 
Since  then,  Shirley  Kidd,  a  member  of  the  MSHN  staff,  has  modified  the  CL  so  that  it  now 
has  the  ability  to  collect  the  following  fine  grain  information: 

•  User  CPU  time 

•  System  CPU  time 

•  Between  system  call  time 
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Currently,  the  above  information  is  forwarded  to  and  stored  in  the  RRD  as  raw 
data.  For  example,  the  CL  may  record  the  wall-clock  time  for  application  X  as  3.34 
seconds.  The  next  time  application  X  is  executed,  the  CL  may  record  the  wall-clock  time 
as  4.29  seconds.  Each  and  every  time  application  X  is  executed  a  separate  wall-clock  time 
is  recorded  and  maintained  in  the  RRD.  The  same  is  true  for  every  other  resource 
measurement  that  the  CL  collects.  Each  measurement  is  stored  in  the  appropriate  data 
repository  (RRD  or  RSS). 

The  RSS  actually  maintains  three  types  of  information;  short  term,  medium  term 
and  long  term.  This  information  comes  from  either  the  CL  or  a  system  administrator. 
When  the  SA  makes  a  request  concerning  a  particular  resource  to  the  RSS,  the  RSS  returns 
its  most  recent  estimate  of  the  resource’s  current  availability.  The  SA  establishes  callbacks 
to  the  RSS  if  the  resource  availability  thresholds  are  surpassed  or  if  a  CL  update  frequency 
requirement  needs  to  be  met. 

The  design  of  the  RRD  is  similar  to  that  of  the  RSS.  The  RRD  contains 
information  about  resource  usage  of  applications.  This  information  is  passed  to  the  SA. 
The  RRD  establishes  callbacks  to  the  SA  when  thresholds  are  surpassed  and  to  meet  any 
update  frequency  requirements.  The  RRD  receives  its  updates  from  the  CL. 

The  above  are  the  components  central  to  this  thesis.  For  a  detailed  description  of 
all  MSHN  components,  how  MSHN  was  built,  how  MSHN  functions,  and  a  tutorial  on 
how  to  wrap  an  application,  see  Matt  Schnaidt’s  thesis  [SCHN98]. 

D.  THE  NEED  FOR  DISTRIBUTION  STATISTICS  IN  MSHN 

The  case  for  why  distribution  statistics  are  needed  in  MSHN  comes  from  a  recent 
paper  written  by  Dr.’s  Taylor  Kidd  and  Debra  Hensgen  of  the  Naval  Postgraduate  School. 
They  show  that  scheduling  algorithms  that  use  both  the  expected  runtime  and  their 
distributions  can  provide  better  schedules  than  those  algorithms  that  rely  solely  on  the 
expected  runtime  (see  [KIDD99].)  The  paper  focuses  primarily  on  the  performance  of  the 
load-and-affmity  policy,  discussed  in  Chapter  I,  when  the  actual  runtimes  of  jobs  can  differ 
from  the  expected  runtimes.  The  fundamental  problem  with  most  RMS  scheduling 
algorithms  is  that  they  assume  resource  usage  is  deterministic.  This  is  a  poor  assumption. 
While  it  is  possible  for  the  estimated  usage  of  a  resomce  to  equal  actual  usage,  it  is  not 
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probable.  Through  experimentation  and  common  sense,  we  know  that  50%  of  the  actual 
usage  is  less  than  the  expected  usage  and  50%  is  more.  Figure  6  shows  how  the  more 
sophisticated  scheduling  algorithms  currently  in  use  work,  such  as  that  used  in  SmartNet. 
Part  A  of  Figure  6  shows  six  different  jobs  run  several  times  on  three  different  machines. 
The  resoxxrce  usage  statistic  we  are  focusing  on  in  this  example  is  job  runtime.  Part  B  of 
Figure  6  shows  the  average  time  (e.g.,  Xj,)  of  the  recorded  runtimes  (e.g.  of 

the  jobs  on  the  three  machines.  The  job  is  then  scheduled  to  run  on  the  machine  upon 
which  the  job  had  the  lowest  expected  runtime  (e.g..  Job  1  runs  fastest  on  Machine  1). 
This  schedule  is  illustrated  in  part  C  of  Figure  6.  Once  the  schedule  is  built,  the  predicted 
time  at  which  the  last  job  completes  can  be  calculated.  In  our  case,  this  predicted  time  is 
1 1  time  units.  If  each  of  the  jobs  in  the  schedule  run  for  exactly  their  estimated  time,  the 
schedule  works  well.  If  any  of  the  jobs  in  the  schedule  happen  to  run  for  more  than  their 
expected  times,  problems  with  the  schedule  can  occur. 

G)  ® 

Machines  Possible  jobs  Samples  Sample  Mean 


Figure  6:  Scheduling  Algorithms  Using  only  the  Estimated  Mean 
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One  such  problem  is  shown  below  in  Figure  7.  In  Figure  7,  Part  A,  the  (i*’’ 
element  of  the  matrix  contains  the  average  execution  time  of  n  runs  of  Job  j  on  Machine 
m.  In  our  example,  the  estimated  runtimes  for  any  of  the  jobs  on  any  of  the  machines  are 
exactly  the  same  (six  time  units).  Because  of  this,  it  does  not  matter  where  a  job  is 
scheduled.  In  Part  B  of  Figure  6,  Job  #1  is  scheduled  for  Machine  #1,  Job  #2  is  scheduled 
for  Machine  #2,  and  Job  #3  is  scheduled  for  Machine  #3. 

It  is  possible  that  all  the  scheduled  jobs  could  run  for  less  than  or  exactly  equal  to 
their  expected  runtimes.  Since  all  the  jobs  finish  either  early  or  right  on  time,  the  expected 
time  at  which  the  last  job  completes  is  less  than  or  equal  to  six  and  this  schedule  will 
perform  well.  Unfortunately,  the  job  execution  times  can  also  finish  after  their  expected 
execution  time.  In  Part  C  of  Figure  7,  the  actual  job  execution  times  are  marked  on  their 
runtime  distributions  on  the  right  side  of  the  figure  as  tick  marks.  We  can  see  that  Job  #1 
finished  at  three  time  imits,  well  before  its  estimated  time  to  complete  of  six  time  units. 
This  early  finish  does  not  have  a  negative  effect  on  the  system.  Unfortunately,  Job  #2 
finished  at  eight  time  units  and  Job  #3  finished  at  seven  time  units  after  their  expected 
times  to  complete  (of  six  time  units).  This  shows  that  the  expected  time  at  which  the  last 
job  completes  cannot  possibly  be  six  time  units.  In  this  situation,  if  the  data  provided  by 
either  Job  #2  or  Job  #3  was  time  sensitive,  say  the  end  user  needed  the  data  at  or  before  the 
expected  completion  time,  he  would  not  have  that  data.  In  addition,  any  subsequently 
scheduled  jobs  would  be  delayed.  The  second  distribution  curve.  Part  D  of  Figure  6, 
shows  that  even  if  the  actual  expected  time  for  the  last  job  to  complete  were  known  for 
50%  of  the  jobs  run,  the  actual  time  at  which  the  last  job  completes  would  be  greater  than 
the  expected  time.  This  delay  would  cause  the  schedule  to  be  inaccurate  resulting  in 
potential  problems  for  the  system.  The  final  distribution  curve.  Part  E  of  Figure  7,  shows 
that  for  good  results,  the  expected  time  at  which  the  last  job  completes  would  need  to 
include  roughly  95%  of  its  associated  distribution.  This  is  a  level  of  accuracy  that  MSHN 
needs. 


19 


Machines 
1  2  3 

J  I  r  6  6  6  ^ 

o  2  6  6  6 

b  3  (5  (5  d  J 

s 


© 

Job  #1  run  on  machine  #1 
Job  #2  run  on  machine  #2 
Job  #3  run  on  machine  #3 

Distribution  or  the  actual  time 
at  which  the  last  job  completes 
50  %  of  actual  execution  times  could 
still  be  >  expected  execution  time 


Need  roughly  95%  of  actual  run  time 
to  complete  before  the  e3q)ected 
runtime 


Figure  7:  Job  Runtime  Distributions 


The  above  example  shows  that  current  scheduling  algorithms  using  only  the  mean 
in  their  calculations  tend  to  underestimate  the  completion  time  of  a  schedule.  These 
underestimates  are  likely  to  cause  backlogs  and  inefficiencies  in  the  system.  For  this 
reason,  MSHN  employs  scheduling  algorithms  that  use  more  information  and  produce 
better  results.  Stochastic  scheduling  algorithms  have  the  potential  to  meet  these 
requirements.  They  are  algorithms  that  use  higher  moments  and  other  distribution 
information  to  compute  their  schedules.  Such  algorithms  will  make  MSHN  more 
effective.  In  order  to  use  such  algorithms,  MSHN  must  have  some  means  of  producing 
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higher  order  moments  and  distributions  (i.e.,  of  producing  distribution  statistics).  This 
thesis  is  an  initial  attempt  at  producing  such  statistics. 
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III.  OVERVIEW  OF  PREVIOUS  WORK 


This  chapter  briefly  describes  how  four  different  system  tools  conduct  resource 
monitoring.  Resource  monitoring  is  an  integral  part  of  MSHN  and  most  other  Resource 
Management  Systems  (RMSs).  RMSs,  in  general,  need  an  agent  to  provide  information  on 
the  availability  of  resources  in  the  system  so  that  applications  can  be  scheduled.  Tools 
such  as  Network  Weather  System  also  need  an  agent  of  sorts  to  monitor  resoiurces  so  that 
they  can  accurately  predict  resource  performance.  As  discussed  in  earlier  chapters,  the 
agent  used  for  resource  monitoring  in  MSHN  is  the  Client  Library.  The  tools  and  RMSs 
described  below  are  the  Network  Weather  System  (NWS),  SmartNet,  MSHN, 
DeSiDeRaTa,  Jewel,  CONDOR,  and  Odyssey. 

A.  RESOURCE  MONITORING 

Simply  put,  resource  monitoring  is  a  mechanism  that  permits  you  to  be  aware  at  all 
time  of  what  resources  you  have  and  how  much  of  each  resource  is  available  for  use.  In 
most  instances  this  is  easier  said  than  done  because  of  an  ever  rapidly  changing 
environment.  The  following  example  helps  illustrate  the  point.  One  of  the  many  training 
exercises  a  Tank  Company  Commander  (CO)  is  responsible  for  is  that  for  an  annual 
gunnery  qualification.  Generally  speaking,  the  resources  necessary  to  accomplish  this 
mission  are  his  fourteen  tank  crews,  fourteen  tanks,  a  tank  gunnery  range,  ammunition, 
fuel,  mechanics,  and  a  medical  squad. 

If  the  company  commander  solely  owned  all  of  these  resources,  and  had  a  birds  eye 
view  of  them  at  all  times,  completing  the  guimery  qualification  on  time  and  with  good 
results  would  be  relatively  easy.  This,  however,  is  not  the  case.  Most,  if  not  all,  of  these 
resources  are  shared  in  some  way.  For  instance,  there  are  several  companies  vying  to  use 
the  same  guimery  range,  and  higher  level  commands  can  take  away  personnel,  equipment 
and  supplies  at  any  time.  The  CO,  then,  must  employ  his  “agents”  (First  Sergeant, 
Training  Non  Commissioned  Officer  (TNCO),  Platoon  Leaders  and  Platoon  Sergeants)  to 
keep  him  current  on  resource  availability  so  that  he  can  make  informed  decisions.  The 
agents  record  and  report  resource  usage  and  availability  to  the  CO.  The  CO  can  then  use 
this  information  to  change  his  plan  accordingly  and  accomplish  his  mission  to  the  best  of 
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his  ability  using  the  reported  available  resources.  In  addition,  and  perhaps  more 
importantly,  all  of  the  information  collected  by  the  CO’s  agents  are  used  in  an  After  Action 
Review  (AAR)  when  the  gunnery  is  complete.  The  AAR  is  conducted  so  that  the  CO  and 
his  staff  can  record  and  save  lessons  learned,  good  and  bad,  fi'om  the  exercise.  The  lessons 
learned  can  then  be  applied  to  the  planning  and  execution  of  the  next  annual  gunnery  in 
hopes  of  improving  overall  performance. 

While  this  system  of  employing  agents  is  good,  it  is  not  perfect.  One  of  the  biggest 
problems  is  getting  accurate  information  to  the  CO  in  a  timely  manner.  For  example,  the 
TNCO  knows  that  the  unit  is  running  low  on  ammunition  but  waits  until  the  last  minute  to 
tell  the  CO.  In  this  situation,  the  unit  will  experience  considerable  downtime,  because  they 
must  wait  while  more  ammimition  is  brought  out  to  the  range.  If,  on  the  other  hand,  the 
TNCO  notifies  the  CO  of  the  ammunition  problem  in  a  timely  manner,  the  CO  can  have 
the  ammimition  brought  out  sooner  and  have  it  waiting  at  the  range.  This  same  example 
can  be  applied  to  maiiy  of  the  above  resources  (fuel,  supplies,  etc.).  Accurate  reporting  is 
also  important.  If  the  TNCO  reports  to  the  CO  that  the  gunnery  range  is  available  for  the 
two-week  period  that  the  CO  wants  to  train,  but  in  reality  the  range  is  only  available  for  the 
first  week,  the  guimery  will  not  be  completed  on  time.  Again,  this  situation  also  applies  to 
more  than  just  the  range  resource.  For  the  most  part,  the  CO’s  agents  are  both  timely  and 
accurate.  The  reason  these  agents  are  timely  and  accurate  is  because,  in  this  case,  the 
agents  can  physically  be  at  the  resource  location  and  see  and  report  exactly  what  the 
resource  availability  is  without  getting  in  the  way  (i.e.,  affecting  the  resource  being 
measured).  His  agents  also  have  several  different  lines  of  communication  for  reporting. 
One  of  the  most  important  points  in  the  example  above  is  that  the  CO  is  getting  near 
perfect  information  on  his  resources  without  suffering  any  mission  performance 
degradation.  This  is  a  hard  thing  to  accomplish  in  a  distributed  computing  environment. 

RMSs  and  tools  that  perform  resource  monitoring  in  a  distributed  computing 
environment,  theoretically,  work  much  in  the  same  way  as  in  the  example  above.  These 
tools  gather  information  about  the  resources  in  their  environment  and  use  the  information 
for  everything  fi’om  scheduling  to  forecasting.  One  of  the  challenges  that  these  tools  must 
overcome  is  to  monitor  the  information  without  adding  overhead.  For  example,  running 
the  ping  program  is  a  way  to  measure  network  throughput  between  two  machines.  Upon 
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connected  computers.  When  ping  completes,  it  records  the  number  of  bytes  sent  and  the 
round  trip  time,  from  which  we  can  get  an  estimate  of  throughput.  The  problem  with 
running  ping  is  that  it  places  an  additional  load  on  the  resource  being  measured  leading  to 
inaccuracies.  To  get  a  better  idea  of  how  this  challenge  and  others  are  overcome,  the 
following  sections  describes  four  system  application  tools  and  how  they  measure 
resources. 


1.  Network  Weather  Service  (NWS) 

The  NWS  is  an  application  designed  to  sense  the  performance  of  resources 
throughout  its  environment  while,  based  on  this  information,  providing  forecasts  of  the 
future  performance  of  those  resources  [WOLS97].  NWS  collects  the  resource 
information  via  three  different  sensors,  a  CPU  sensor,  a  network  link  sensor,  and  a 
memory  sensor.  These  sensors  send  their  information  to  a  subsystem  where  the  data  is 
preprocessed  and  passed  to  a  database  for  use  as  inputs  by  up  to  three  possible 
forecasting  methods.  The  rest  of  this  section  describes  how  the  sensors  in  NWS  gather 
information. 

As  with  most  data  gathering  agents,  the  NWS’s  CPU  and  Network  sensors  aim 
to  limit  their  intrusiveness  when  taking  measurements,  thus  lessening  the  opportunity 
for  hindering  the  performance  of  the  applications  that  are  executing. 

The  NWS  CPU  sensor  uses  a  combination  of  techniques  to  measure  CPU 
availability.  First,  two  Unix  system  utilities,  uptime  and  vmstat,  are  used.  The 
CPU  Sensor  calls  uptime  with  the  one-minute  variable  and  calculates  the  fraction  of 
the  CPU  a  process  would  get  if  it  were  run  at  that  particular  time.  When  the  CPU 
Sensor  calls  the  Unix  system  utility  vmstat,  the  fraction  of  the  CPU  a  process  would 
get  is  calculated  using  a  combination  of  idle  time,  user  time  and  system  time 
measurements.  The  authors  of  [WOSH97]  point  out  that,  while  these  utilities  do  not 
add  a  significant  amount  of  overhead,  each  can  leave  out  significant  information  that 
could  lead  to  inaccurate  measurements  of  available  CPU.  The  example  that  the 
[WASH97]  article  uses  is  that  neither  utility  can  provide  information  on  the  priority  of 
any  of  the  currently  running  processes.  The  CPU  Sensor  is  designed  to  make  up  for 
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these  shortcomings  by  running  a  type  of  daemon  that  mimics  a  compute  intensive 
program.  For  example,  the  available  CPU  can  be  calculated  as  the  ratio  of  the 
observed  occupied  CPU  to  the  execution  (wall-clock)  time  of  the  daemon.  The  Sensor 
then  compares  the  results  from  this  process  to  the  measiurements  of  uptime  and 
vmstat  and  takes  the  more  accurate  information  of  the  group.  The  obvious  problem 
of  running  the  daemon  is  that  this  procedure  adds  overhead.  For  this  reason,  the 
daemon  is  not  run  as  often  as  the  Unix  utilities.  The  CPU  Sensor,  at  times,  runs  all 
three  processes  simultaneously.  This  serves  a  couple  of  purposes.  First,  when  all  three 
processes  are  executed  at  the  same  time,  the  daemon  process  is  taken  as  fact.  Second, 
the  daemon  process  results  are  used  to  bias  the  Unix  utility  results  in  order  to  keep 
them  more  accmate  until  the  next  daemon  is  executed. 

To  keep  the  number  of  daemon  executions  to  a  minimum,  the  Sensor  applies 
special  heuristics.  In  general,  as  long  as  the  Unix  utility  results  are  relatively  stable, 
the  daemon  does  not  need  to  run  as  often.  As  the  Unix  utility’s  results  vary 
significantly,  the  daemon  must  run  to  get  the  most  accurate  estimate  of  CPU 
availability.  For  a  more  detailed  explanation  of  how  the  NWS  CPU  Sensor  works,  see 
[WOSH97]. 

As  mentioned  earlier,  NWS  also  deploys  a  set  of  Network  Sensors.  The 
Network  sensors  work  much  in  the  same  way  as  the  daemon  process  described  in  the 
CPU  Sensor  section  above.  Unlike  the  CPU  Sensor,  there  is  only  one  possibility  for 
taking  network  measxarements  in  NWS.  The  single  method  of  network  measurement 
stems  from  the  lack  of  available  and  consistent  performance  data  between  arbitrary 
machines. 

NWS  Network  Sensors  record  three  measurements,  latency,  throughput  and 
effective  throughput  across  a  network  link.  Every  machine  in  the  NWS  environment 
executes  a  copy  of  the  NWS  server.  This  server  maintains  a  copy  of  all  machines 
being  monitored  and  the  TCP  port  that  the  server  is  coimected  to.  Each  server  at  some 
point  receives  a  token.  The  token  is  basically  a  permission  slip  to  allow  that  server  to 
conduct  a  sampling  of  a  network  link.  The  server  selects  a  host  from  its  list  and  sends 
a  roimd  trip  single  word  packet  to  that  host.  When  the  packet  returns,  an  estimation  of 
latency  is  calculated  by  dividing  the  round-trip  time  by  two.  Once  the  latency 


estimation  is  complete,  the  server  sends  another  packet  with  a  specified  amount  of  data 
to  the  same  host  and  times  the  transfer.  The  throughput  can  then  be  calculated  by 
dividing  the  data  size  by  the  transfer  time.  Finally,  effective  throughput  can  be 
calculated  by  dividing  data  size  by  the  result  of  subtracting  the  transfer  time  fi-om  the 
latency.  At  this  point,  all  three  measurements  can  be  stored  for  later  use.  This 
technique  can  be  intrusive  so  the  designers  of  NWS  organized  the  network  sensors  in  a 
specific  way  to  limit  the  number  of  samples  that  are  taken.  For  specifics  on  the 
Network  Sensor  Hierarchy,  see  [WOSH97,WOLS97]. 

2.  SmartNet 

SmartNet  is  covered  extensively  in  Chapter  I.  Therefore,  in  this  section  we  will 
simply  present  the  SmartNet  Architecture,  Figure  8  below,  and  briefly  describe  how 
SmartNet  computes  and  uses  the  average  runtimes  of  applications  for  the  scheduling  of 
subsequent  runs. 
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Figure  8:  SmartNet  Architecture  From  [KIDD96] 
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To  imderstand  how  SmartNet  computes  the  runtime  estimates  for  subsequent  runs 
of  an  application,  consider  the  following  example.  A  user  submits  an  application  to 
SmartNet,  which  includes  as  parameters  an  estimate  of  the  expected  runtime  equal  to  two 
minutes  and  a  weight  equal  to  80%.  (The  weight  represents  how  sure  the  user  is  of  the 
estimated  runtime  he  is  providing.)  When  the  application  starts,  SmartNet  starts  a  clock  to 
measure  wall  clock  time  (WCT).  When  the  application  completes,  the  SmartNet  timer 
stops  and  records  the  WCT.  Let  WCT  =  150  sec  for  this  example.  SmartNet  then 
computes  a  sample  average  running  WCT  and  a  sample  standard  deviation  of  the  WCT. 
To  keep  the  example  simple,  we  ignore  the  sample  standard  deviation  and  let  the  sample 
average  WCT  =  150.  The  above  information  is  stored  in  the  SmartNet  Database  for  later 
use  by  the  SmartNet  Scheduler.  On  subsequent  runs  of  the  above  application,  the 
SmartNet  Scheduler  would  use  the  following  equation  to  compute  the  input  parameter. 

Parameter  =  Weight  x  (Users  Estimate  Runtime)  +  (1  -  Weight)  x  (Sample  Average 
WCT) 

Parameter  =  .80(120  sec)  +  (1-.80)(150  sec) 

Parameter  =  96  sec  +  30  sec 
Parameter  =  126  sec 

Concluding  our  example,  the  SmartNet  Scheduler  would  use  126  seconds  as  its  input 
parameter. 


3.  DeSiDeRaTa 

DeSiDeRaTa  is  a  Defense  Advanced  Research  Projects  Agency  (DARPA)  and 
Naval  Surface  Warfare  Center  (NSWC)  sponsored  software  development  project  at  the 
University  of  Texas  at  Arlington.  It  has  the  goals  of  providing  resource  and  Quality  of 
Service  (QoS)  management  for  Dynamic,  Scalable,  Dependable  Real-Time  Systems, 
hence  the  name  DeSiDeRaTa.  In  short,  DeSiDeRaTa  aims  to  manage  resources  and 
real-time  applications  in  a  distributed  shipboard  computing  systems  domain.  The 
difference  between  DeSiDeRaTa  and  previous  works  is  that  it  accounts  for  complex 
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features  such  as  variable  periods,  sporadic  processes,  priorities,  fault  management  and 
scalability  [DESI]. 

Of  interest  to  us  is  how  DeSiDeRaTa  conducts  resource  monitoring. 
DeSiDeRaTa  has  several  components  to  manage  and  monitor  resources  and 
applications  in  its  environment.  Figure  9  below  is  the  author’s  interpretation  of  how 
the  components  work  together  to  gather  the  needed  resource  information  and  make 
appropriate  adjustments. 


Figure  9:  Resource  Information  Flow  of  DeSiDeRaTa 

To  monitor  hardware  in  its  domain,  DeSiDeRaTa  employs  a  daemon  on  each 
host.  This  daemon  is  a  hardware  monitor  that  periodically  collects  elementary  load 
metrics  for  the  host  and  the  LAN.  Examples  of  the  above  metrics  are  CPU  queue 
length,  free  memory  and  context  switches.  The  metrics  collected  by  the  Hardware 
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Monitor  are  passed  to  the  Hardware  Broker.  The  Hardware  Broker  uses  the  metrics  to 
calculate  an  aggregate  load  index  for  each  machine.  The  Hardware  Broker  then 
updates  the  Hardware  Analyzer  with  the  load  index.  The  analyzer  then  serves  as  a 
database  for  this  information.  Finally,  the  Resource  Manager  can  use  this  information 
to  determine  how  best  to  re-allocate  resources  to  improve  real-time  QoS.  To  get  a 
complete  description  of  the  other  DeSiDeRaTa  components,  see  the  DeSiDeRaTa 
manual  [DESI]. 

4.  Condor 

Condor  is  a  system  developed  at  the  University  of  Wisconsin-Madison  that 
locates  where  there  are  machines  with  idle  CPU  cycles  in  its  environment  and 
schedules  jobs  to  execute  on  them.  Figure  10  shows  how  Condor  is  organized. 
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Figure  10:  The  Condor  Pool  From  [SCHN98] 
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In  the  Condor  system,  jobs  are  submitted  to  the  Central  Manager  where  they  are 
queued  up  until  an  idle  machine  is  available.  In  Figure  10,  the  machines  labeled  “Idle 
Machine”  are  available  for  use  by  any  job  in  the  Condor  system.  The  machine  labeled 
“Remote  Execution  Machine”  has  a  job  assigned  by  the  central  manager  executing  on 
it.  The  machine  labeled  “Submitting  Machine”  is  where  the  job  that  is  executing  on 
the  Remote  Execution  Machine  originated. 

Resource  monitoring  in  the  Condor  system  is  similar  to  resource  monitoring  in 
MSHN.  In  fact,  some  of  the  ideas  for  the  MSHN  Client  Library  (CL)  came  from 
Condor,  in  particular,  the  idea  of  wrapping  system  calls  and  statically  linking  jobs  with 
libraries.  Condor,  like  MSHN,  operates  by  intercepting  system  calls  made  by  user 
applications.  The  Condor  Library,  operating  on  the  Remote  Execution  Machine, 
intercepts  the  application  system  call  and  sends  it  back  to  the  Submitting  Machine 
where  the  call  can  be  processed.  Once  the  system  call  has  completed  processing,  the 
results  are  passed  back  to  the  Condor  Library  on  the  Remote  Execution  Machine, 
where  the  library  returns  the  results  to  the  user’s  system  call.  In  addition  to  tracking 
application  resource  requirements.  Condor  also  employs  daemons  to  constantly 
monitor  system  activity.  These  daemons  send  signals  to  the  Condor  Library,  letting  it 
know  that  the  workstation  is  no  longer  available.  When  the  library  receives  the  signal, 
it  writes  information  to  a  “checkpoint  file”  and  terminates  the  process.  The 
“checkpoint  file”  is  a  file  that  contains  state  information  so  that  when  the  process  starts 
on  another  machine,  it  can  restart  where  it  terminated.  For  further  details  on  how  the 
Condor  system  works,  see  [LITZ97] 

5.  Odyssey 

A  team  from  Carnegie  Mellon  University  designed  the  software  platform 
Odyssey  to  manage  resources  (i.e.,  the  available  network  bandwidth  and  network 
quality)  in  a  mobile  computing  environment.  Odyssey  is  a  system  that  interfaces  with 
an  application,  determines  the  necessary  amount  of  resources  needed  by  an  application, 
and  if  those  resource  requirements  can  not  be  met,  allows  the  application  to  adapt  and 
execute  in  a  degraded  mode.  A  similar  situation  occurs  if  more  of  a  resource  becomes 
available;  the  application  can  adapt  and  execute  in  an  upgraded  mode.  The  ability  of  an 
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application  to  adapt  depends  heavily  on  Odyssey’s  ability  to  monitor  the  resources  in 
its  environment.  The  following  paragraph  is  dedicated  to  explaining  how  Odyssey 
monitors  resources  in  its  environment. 

Odyssey  can  interface  with  its  applications  in  two  distinct  ways.  First,  if  the 
source  code  is  readily  available,  then  it  (the  source  code)  can  be  modified  to  interface 
directly  with  Odyssey  components  (see  Figure  11).  Second,  if  the  source  code  is  not 
available,  then  Odyssey  employs  a  module  called  “cellophane”  that  seamlessly 
transforms  application  request  into  Odyssey  objects.  These  techniques  are  key  for 
resource  monitoring  in  Odyssey’s.  Figure  11  is  a  graphic  representation  of  the 
Odyssey  client  Architecture  from  [NOBL97]  and  is  useful  in  explaining  how  Odyssey 
works. 


Figure  11:  Odyssey  Client  Architecture  After  [NOBL97] 
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The  Odyssey  components  consist  of  the  Interceptor,  the  Viceroy  and  any  number  of 
Wardens  (see  Figure  11).  The  Interceptor  intercepts  all  system  calls  made  by  the 
applications  and  passes  them  to  the  Viceroy.  The  Viceroy  serves  a  couple  of  piuposes.  It 
is  responsible  for  .  monitoring  system  performance  and  resource  usage,  and  advises  clients 
on  adapting.  The  Viceroy  gets  an  application’s  resource  requirements  when  the 
application  starts.  The  requirements  are  recorded  and  used  for  comparison  against  the 
reading  of  actual  resource  availability.  The  Wardens  are  responsible  for  communicating 
information  between  the  Viceroy  and  clients,  and  they  manage  data  types  associated  with 
their  server.  The  server,  in  this  case,  is  a  remote  program  that  provides  data  or  processing 
for  the  mobile  users  [SCHN98].  For  a  full  explanation  and  examples  of  how  Odyssey 
works,  see  [NOBL97]. 

6.  MSHN 

MSHN’s  agent  for  monitoring  resources  is  the  Client  Library  (CL).  The  CL  is  a 
relatively  unique  way  of  monitoring  resomces  and  is  explained  in  detail  in  Chapter  V. 

B.  STOCHASTIC  SCHEDULING 

Stochastic  scheduling  is  an  extremely  active  research  area  and  the  work  in  this 
thesis  closely  meshes  with  it.  Scheduling  problems  are  inherently  hard.  They  become 
even  harder  when  uncertainty  is  introduced  into  the  problem.  For  example,  scheduling 
applications  in  a  heterogeneous  distributed  network  is  hard  even  if  the  scheduler  knows 
everything.  Scheduling  in  a  heterogeneous  distributed  network  becomes  even  harder  when 
there  is  some  uncertainty  in  availability  of  resources  and  application  or  task  times.  There 
has  been  a  significant  amount  of  research  in  this  area  and  this  section  was  written  simply  to 
give  reference  to  that  work  in  case  the  reader  is  interested.  The  five,  fairly  recent,  papers 
that  we  reference  for  this  thesis  are  [ROSS91],  [BECKOO],  [LIUR98],  [BRES96]  and 
[KAUF97]. 
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C.  SUMMARY 


In  this  chapter,  we  discussed  several  existing  tools  that  perform  resource 
monitoring  for  one  reason  or  another.  We  discussed  how  each  of  the  tools  conducts 
monitoring  and  some  of  the  challenges  that  these  tools  had  to  overcome.  For  example,  we 
saw  that  it  was  not  entirely  possible  to  eliminate  the  problem  of  intrusiveness  in  NWS,  but 
the  degree  of  intrusiveness  could  be  limited.  In  Chapter  V,  we  will  see,  in  detail,  how 
MSHN  conducts  resource  monitoring  and  deals  with  the  many  challenges.  In  Section  B, 
we  discussed  some  of  the  current  stochastic  scheduling  algorithms  that  exist  and  how  they 
work. 
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IV.  FORMAL  DEFINITION  OF  THE  PROBLEM 


A.  PROBLEM 

The  primary  goal  of  this  thesis  is  to  study  possible  methods  of  aggregating  data 
collected  by  the  Client  Library  (CL)  for  later  storage  in  the  Resource  Requirements 
Database  (RRD)  and  the  Resource  Status  Server  (RSS).  The  use  of  statistics  offers  an 
excellent  solution  to  our  aggregation  problem,  and  being  able  to  accurately  predict  the 
distribution  of  the  data  allows  MSHN  to  potentially  make  use  of  powerful  distribution 
specific  methods.  There  are  several  methods  available  to  compute  statistics  and 
distributions.  The  problem  is  determining  which  method(s)  provide  the  best  (or  most 
accurate)  information.  In  addition,  we  want  to  obtain  an  initial  estimate  of  how  much,  if 
any,  of  the  raw  data  generated  by  the  CL  we  need  to  maintain  in  the  RRD  and  RSS. 

B.  APPROACHES  TO  SOLVING  THE  PROBLEM 

For  reasons  discussed  above  and  in  Chapter  IV,  MSHN  desires  to  be  able  to 
accurately  predict  the  distribution  of  the  data  collected  by  the  CL.  Several  “goodness  of 
fit”  tests  exist  that  MSHN  could  use  to  predict  these  distributions.  One  of  the  most 
significant  challenges  in  predicting  a  distribution  is  first  determining  which  test  to  use. 
The  tests  chosen  in  this  thesis  are  the  Kolmogorov-Smimov  test,  the  Anderson-Darling 
test,  and  the  Dirichlet  Process.  Each  test  is  described  in  detail  below.  Included  in  the 
descriptions  are  the  inherent  advantages  and  disadvantages  of  the  method  and  why  the 
particular  method  was  chosen  for  use  in  this  thesis.  The  descriptions  are  followed  by  a 
brief  summary  of  this  chapter. 

1.  Kolmogorov-Smirnov  Test  (K-S  test) 

By  definition,  the  K-S  test  examines  whether  a  given  sample  of  n  observations  is 
from  a  specified  continuous  distribution  [JAIN91].  The  K-S  test  measures  the  maximum 
deviation  of  the  observed  cdf  from  the  theoretical  cdf.  This  value  is  compared  to  a  specific 
value  found  in  tables  giving  the  quantiles  of  the  K-S  distribution.  If  the  maximum 
deviation  of  the  observed  cdf  from  the  theoretical  cdf,  known  as  the  K-S  statistic,  is  less 
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than  the  table  value,  the  sample  is  from  the  hypothesized  distribution  at  the  specified  level 
of  significance.  Figure  12  shows  the  measured  difference  between  a  hypothesized 
distribution  and  an  observed  distribution.  Note  that  the  maximum  deviation  could  be 
above  or  below  the  hypothesized  distribution. 


X 

Figure  12:  Hypothesized  Distribution  vs.  Observed  Distribution  After  [LAWK91] 

The  K-S  goodness-of-fit  test  was  chosen  for  use  in  this  research  for  several  reasons. 
Most  significant  are  that  the  K-S  test  (1)  does  not  require  any  grouping  of  the  data  and 
therefore  no  information  is  lost,  (2)  can  be  used  for  any  sample  size,  and  (3)  tends  to  be 
more  powerful  than  tests  such  as  the  chi-square  test.  Like  all  other  “goodness  of  fit”  tests 
researched,  the  K-S  test  has  a  possible  drawback.  According  to  [LAWK92],  the  original 
form  of  the  K-S  test  is  valid  if  and  only  if  all  of  the  parameters  of  the  hypothesized 
distribution  are  known  and  the  distribution  is  continuous.  The  data  collected  by  the  CL  is 
continuous,  but  the  population  parameters  are  impossible  to  know.  Therefore,  the 
parameters  must  be  estimated  based  on  the  sample  data  provided  by  the  CL.  Fortunately, 
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the  K-S  test  was  extended  too  particularly  accommodate  the  estimating  of  parameters  for 
the  following  four  distributions:  normal,  log-normal,  exponential  and  Weibull.  The 
characteristics  of  previous  data  collected  by  predecessors  to  the  CL  leads  us  to  believe  that 
the  distributions  are  likely  exponential  or  Gaussian.  Therefore,  these  are  the  distributions 
we  will  test  our  samples  against.  Testing  the  Gaussian  and  exponential  distributions  also 
eliminates  the  above  drawback  of  the  K-S  test.  To  illustrate  how  the  test  works,  a  simple 
hypothetical  example  is  provided  below. 

Suppose  the  CL  collects  the  following  wall-clock  runtimes  for  ten  executions  of 
application  Y.  We  want  to  test  if  the  sample  data  is  jfrom  a  population  with  a  Gaussian 
distribution  where  the  parameters  fj,  and  cr^  are  unknown  (i.e.,  the  sample  is  from  N(// , 
G  )).  The  data  below  was  actually  generated  using  Microsoft  Excel’s  random  number 
generator. 


Application  Y  Run  # 

Recorded  Wall-Clock  Time 

1 

10.54 

2 

9.16 

3 

9.42 

4 

9.42 

5 

9.41 

6 

10.51 

7 

10.05 

8 

9.70 

9 

10.09 

10 

8.59 

Table  2:  Sample  Data  from  a  Normal  Population 


Ho,  our  null  hypothesis,  is  that  oiu  sample  data  is  from  a  normal  distribution. 

>  (^i-a  is  the  formula  used  to  test  if  our  hypothesis  should  be 


accepted  or  rejected.  If  the  above  formula  is  true,  we  must  reject  Ho  in  favor  of  Ha,  the 
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alternative  hypothesis  (that  the  above  data  is  not  from  a  nonnal  distribution).  Our  formula 


consists  of  three  separate  components, 


'rn-OM.^] 

■yjn  J 


,  D„,  and  . 


is  an  adjustment  to  the  test  statistic  to  avoid  having  to  use  very  large 


look  up  tables.  is  the  actual  K-S  statistic.  The  K-S  statistic  is  the  largest  vertical 
distance  between  the  hypothesized  and  empirical  distribution.  Z)„  =SUp{  }» 


where  F„(x)is  the  empirical  distribution  and  F{x)is  the  hypothesized  distribution. 
Because  we  are  assuming  that  the  above  data  comes  from  a  population  with  a  normal 
distribution,  where  the  population  mean  ( // )  and  the  population  standard  deviation  (cr )  are 
unknown  (i.e.,  F  =  N(// ,  cr)),  we  must  use  a  special  case  of  the  K-S  test.  We  must  first 
estimate  both  //  and  cr  by  the  sample  mean  ( X  (n))  and  the  sample  standard  deviation 

(S^(n)).  The  calculation  to  determine  F(x)  is  as  follows;  F(x)  =  oj  [x  -X(n)\l (n) 
where  <5  is  the  distribution  function  of  the  standard  normal.  Each  data  point,  x,  from  our 
sample  is  input  into  the  above  formula,  a  decimal  number  is  calculated,  and  that  number  is 
used  to  locate  the  value  for  F(X(i)).  From  our  example,  for  F(X(i)),  we  start  with  x  =8.59, 

Jf  =  9.689,  and  5^=0.615.  F(X(i))  =  (t>\^x-X(n)]/-yJs^(n)  |,  substituting,  we  get 

F(X(i))  =  O  {-1.788}.  Looking  up  -1.78  in  the  standard  normal  table,  we  get  F(X(i))  = 
0.0375.  The  results  for  F(X(j))  are  listed  below. 
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Application  Y  Run  # 

{lx-X(n)]/4^n)  } 

F(X(i)) 

1 

-1.788 

0.0375 

2 

-0.861 

0.1949 

3 

-0.454 

0.0735 

4 

-0.438 

0.3336 

5 

-0.438 

0.3336 

6 

0.018 

0.5714 

7 

0.587 

0.7190 

8 

0.652 

0.7422 

9 

1.336 

0.9082 

10 

1.385 

0.9162 

Table  3:  Results  of  F(X(i))  i  =  1  to  10 


With  the  results  from  above,  we  can  now  finish  our  calculation  of  D„ .  Because 


our  data  is  in  ascending  order,  we  can  compute  A  =inaXi  — 

\^i^n  L  W  J 

-0.  is  selected  by  taking  the  largest  value  of  all  of  the  Dn"^ 
and  Dn'  computed.  Dn^  and  Dn'  for  i  =  1  is  calculated  as  an  example. 

d:  =  {^-0.0375} =0.0625  and  B.'  =max{p(^<„)-^l  = 

|o.0375  —  =  0.0375  at  this  point  then,  Dn  =  0.0625.  Upon  completion  of  the  rest  of 

the  calculations  for  Dn,  we  find  our  result  to  be  Dn  =  0.1664.  We  now  must  apply  the 

adjustment,  V«-0.01  +  ^^l,  to  our  test  statistic.  Our  formula  is  now 

n  ) 

—  0.01  +  ^  X  0.1664  >  C,_„ ,  3.22)12)  x  0.1664  >  Cj_^  =>  .5387  >  C,_^ .  Looking  up 
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the  modified  critical  value,  ,  we  see  that  0.5387  is  less  than  either  0.775,  0.819,  0.895, 
0.955  or  1.035. 


Table  4:  Modified  Critical  Values  for  Adjusted  K-S  Test  Statistics  From  [LAWK91] 

This  means  that  at  significance  levels  of  a  =  .15,  .1,  .05,  .025  and  .01  (i.e.,  at  85%,  90%, 
95%,  97.5%  and  99%  confidence  levels),  we  accept  that  our  data  is  fi^om  a  normal 
distribution. 

The  K-S  test  can  also  be  used  to  test  that  sample  data  came  fi-om  a  population  with 
an  exponential  distribution.  In  this  case,  the  hypothesized  distribution  is  expo(|3),  where  p 
is  unknown.  We  must  estimate  the  parameter  p  by  X{n\  F(x)  is  now  defined  to  be  the 

expo(X(n))  distribution  function  and  therefore,  F  =  for  x  >  0.  Using  the  data 

from  our  example  above,  we  can  now  test  the  Ho  (that  the  sample  data  is  exponentially 


distributed).  The  formula  for  is  equal  to  the  larger  value  of 

A*=maxfe-a-^"™)}  “d  I 

l^i£n  I  W  J 

The  adjustment  for  our  statistic,  £)„,  is  (Z)„ -^)(Vn +0.26  +  -^)  jfrom  Table  2  above. 

Based  on  our  sample  data,  D„  =  0.58793 ,  «  =  10 ,  and  (d„  -  ^%fn  +  0.26  +  ^)=  2.0334 . 

From  Table  4  above,  we  see  that  the  null  hypothesis  must  be  rejected  in  favor  of  the 
alternative  at  the  85,  90,  95,  97.5  and  99%  confidence  levels,  because  the  value  for  our 
adjusted  statistic  is  larger  than  the  critical  values  in  Table  4. 

2.  Anderson-Darling  Test  (A-D  test) 

The  Anderson-Darling  test  is  similar  to  the  K-S  test  in  that  it  has  a  similar  format 
and  follows  the  same  basic  steps.  In  general,  we  must  (1)  determine  the  value  of  the  A-D 

test  statistic,  which  is  denoted  as  ,  (2)  make  the  appropriate  adjustment  according  to  the 
hypothesized  distribution  we  are  using,  and  (3)  compare  the  computed  value  to  the 
modified  critical  values  for  the  adjusted  A-D  test  statistic. 

=  (-  fc,  (2''  -  Ifto  Z,  +  h(l  -  Z..,-, )])/  «)-« 

The  A-D  test  has  the  ability  to  detect  discrepancies  in  the  tails  of  distributions,  something 
the  K-S  test  can  not  do.  The  A-D  test  is  also  a  higher-powered  test  than  the  K-S  test.  For 
these  reasons,  the  A-D  test  is  included  in  this  research.  This  particular  form  of  the  A-D 
test  was  selected  because  it  allows  the  use  of  a  much  smaller  critical  value  table  (see 
[LAWK91]).  The  smaller  critical  value  table  made  coding  the  test  much  easier. 
Examples  of  the  A-D  test,  using  the  normal  distribution  and  exponential  distribution  cases, 
are  conducted  below.  In  the  first  instance,  we  test  the  normal  distribution.  In  this 

example,  let  «=10,  4o  =0.02579,  and  the  adjustment  (l-f-^-^)  =  1.15.  The  adjusted 
test  statistic  is  then  (1.15)(0.02579)  =  0.09266.  From  Table  3  below,  we  can  see  that  our 
hypothesis  is  accepted  at  all  of  the  levels  of  significance. 
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1-a 


Case 

Adjusted  Statistic 

0.900 

0.950 

0.975 

0.990 

All  parameters  known 

j£  forn>5 

1.933 

2.492 

3.070 

3.857 

n  rf )  ” 

0.632 

0.751 

0.870 

1.029 

exp  o(Jf  (n)) 

r,  O'6'l  9 

1+  An 

K  n ) 

1.070 

1.326 

1.587 

1.943 

Table  5:  Modified  Critical  Values  for  Adjusted  A-D  Test  Statistics  From  [LAWK91] 

Conducting  the  A-D  test  for  the  exponential  case  with  the  above  sample  data 
returns  the  following  results:  «  =10,  Ajl  =  4.0704,  adjustment  =1  +  -^.  The  adjusted  test 
statistic  is  then  (4.0704)(1.06)  =  4.3146.  From  Table  5  above,  we  can  see  that  our 
hypothesis  is  rejected  at  all  of  the  levels  of  significance.  In  both  the  normal  and 
exponential  distribution  cases,  the  A-D  test  and  the  K-S  test  properly  accept  and  then  reject 
the  null  hypothesis.  Our  next  method  takes  a  different  approach  to  solving  the  problem. 


42 


3.  Non-Parametric  Bayesian  Approach  (Dirichlet  process) 

The  Dirichlet  process  takes  a  slightly  different  approach  than  the  above  “goodness 
of  fit”  tests.  Instead  of  testing  if  our  data  is  from  a  particular  distribution,  we  compute  the 
actual  distribution  of  the  data.  The  formula  for  this  process  is 

The  formula  has  three  significant  parts.  F^(x)  is  a  prior  guess  of  the  distribution  (i.e.,  for 

P(X  <  x) )  where,  for  example,  X  ~  N(/j  =  1 0,  cr  =  2) .  F(x)  is  the  empirical  cumulative 
distribution  function  (cdf).  This  is  an  estimate  of  the  true  cdf,  F(x) . 

F{x)  =  ~  where  n  =  number  of  data  elements. 

n 

Finally,  a  is  a  weight  put  on  F^(x),  the  prior  guess.  It  is  measured  in  number  of 
observations  worth.  As  an  example,  if  we  have  10  data  elements  and  apply  an  a  (weight) 
of  20  to  our  guess  of  the  distribution,  then  what  we  are  implying  is  that  we  are  relatively 
confident  that  we  have  made  an  accurate  guess  as  to  what  the  real  distribution  is.  In  this 
example,  our  guess  is  worth  20  observations.  If,  on  the  other  hand,  we  apply  an  «  of  5 
then  we  are  implying  that  we  are  not  very  confident  in  our  guess  of  what  the  underlying 
distribution  is  and,  therefore,  we  apply  a  small  weight  to  our  guess.  In  this  example,  then, 
our  weight  is  only  worth  5  observations.  In  Figure  13  below,  we  show  graphically  how  the 
Dirichlet  process  works.  The  guess  for  the  distribution,  Fq{u),  in  this  example  is 
exponential.  The  alpha  is  very  low  because  we  are  not  certain  that  the  guess  made  for  the 
distribution  is  at  all  close  to  the  actual  distribution.  Therefore,  we  are  putting  only  a  little 
emphasis  on  our  guess.  From  Figure  13,  we  can  see  that  the  guess  is  indeed  a  bad  one. 
The  actual  distribution,  (u) ,  is  a  normal  distribution  with  a  mean  of  0  and  a  standard 
deviation  of  1.  As  more  data  points  are  introduced  to  this  process,  the  initial  guess  takes 
on  less  and  less  weight  and  the  actual  distribution  is  approached  based  on  the  empirical 
data.  Making  a  good  initial  guess  and  heavily  weighting  it  is  beneficial  in  that  the  actual 
distribution  can  be  reached  more  rapidly  than  if  we  make  a  bad  initial  guess.  Figure  13 
below  shows  a  bad  initial  guess  with  small  a .  We  can  see  from  Figure  13  that  our  initial 
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guess  for  the  distribution  is  a  significant  distance  fi’om  the  actual  distribution.  Even  so,  the 
estimated  cdf  will  eventually  converge  to  the  actual  cdf. 


Figure  13:  cdf  Convergence  Using  the  Dirichlet  Process 
C.  SUMMARY 

This  chapter  first  discussed  two  common  techniques  that  we  use  to  determine  if 
data  firom  an  application  comes  from  a  particular  distribution.  The  third  technique 
discussed,  the  Dirichlet  process,  actually  converges  to  the  real  distribution  of  the  data.  It  is 
important  to  note  that  there  are  many  statistical  approaches  that  could  be  implemented  to 
solve  our  problem.  These  particular  techniques  were  chosen  for  the  reasons  mentioned 
above. 
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V. 


MONITORING  RESOURCES  USING  MSHN’SCLIENT  LIBRARY 


Monitoring  resources  in  MSHN  is  done  by  using  the  MSHN  Client  Library  (CL) 
with  which  each  application  is  wrapped.  This  chapter  describes  how  an  application  is 
wrapped,  and  how  resources  are  monitored  using  the  MSHN  CL.  The  chapter  also 
discusses  some  recent  additions  to  the  CL  that  provide  finer  grained  data  than  did  the 
previous  CL  version. 

A.  THE  CLIENT  LIBRARY 

The  MSHN  CL  was  designed,  implemented,  and  tested  by  Matthew  Schnaidt,  a 
1998  graduate  of  the  Naval  Postgraduate  School.  Schnaidt’s  thesis  entitled,  “Design, 
Implementation,  and  Testing  of  MSHN’s  Resource  Monitoring  Library”  [SCHN98]  is  the 
primary  reference  used  for  this  chapter.  For  a  complete  understanding  of  the  MSHN  CL 
and  its  operation,  see  Schnaidt’s  thesis  [SCHN98].  Recently,  Shirley  Kidd  has  modified 
the  CL  to  include  finer  grained  output  information.  A  more  detailed  discussion  of  these 
modifications  can  be  found  in  Section  D  of  this  chapter. 

At  a  high  level,  the  operation  of  the  CL  can  be  described  as  follows.  An 
application  to  be  run  in  the  MSHN  environment  is  first  “wrapped”  (as  explained  in  the  next 
section)  by  the  CL.  Once  wrapped,  the  application  is  scheduled  and  eventually  run  on 
some  machine  in  the  MSHN  environment.  As  the  application  runs,  the  CL  records  the 
application’s  resource  usage  (e.g.,  the  amount  of  disk  space,  CPU  time  and  network  time 
used)  and  forwards  the  information  to  the  appropriate  MSHN  component  (e.g.,  the 
Resource  Requirements  Database).  In  the  next  section,  we  take  a  closer  look  at  what  it 
means  to  wrap  an  application,  as  wrapping  is  the  key  to  the  mechanism  used  to 
transparently  monitor  both  the  usage  and  status  of  resources  in  MSHN. 

B.  A  WRAPPED  APPLICATION 

In  this  section,  we  contrast  the  normal  process  of  compiling  and  linking  Bison  with 
one  where  Bison  is  wrapped  with  the  MHSN  CL. 
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Figure  14  below  compares  how  an  application,  in  this  example  Bison,  is  normally 
compiled  (left)  with  how  the  same  application  is  compiled  and  linked  with  the  MSHN 
wrapper  (right). 


OrSnary  Compilation 
of 
Bison 


Wrapped  CompiMon 
of 

Bison 


Figure  14:  Ordinary  Linking  of  Bison  vs.  Wrapped  Linking  of  Bison 


On  the  left  side  of  Figure  14,  the  object  file  for  Bison  is  linked  with  the  appropriate 
libraries  needed  to  define  the  many  system  calls  made  during  Bison’s  execution.  In  Figure 
14,  we  show  bison.cc  linking  with  the  standard  C  libraries  and  the  crtl.o  file.  When 
linking  is  complete,  the  executable  file,  bison*,  is  created  and  the  program  can  be  used. 
Looking  at  the  right  side  of  Figure  14,  we  see  that  the  process  is  the  same,  but  the  files  that 
the  object  code  is  linked  with  are  different.  For  the  Client  Library  to  produce  the  results 
we  are  looking  for,  e.g.,  the  number  of  reads,  the  number  of  bytes  read,  the  bytes  written. 
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and  the  number  of  writes  for  both  local  and  network  disks,  the  application  must  link  with 
modified  C  libraries  and  a  modified  crtl.o  file.  These  modifications  allow  the  MSHN 
wrapper  to  intercept  the  process’  system  calls,  record  the  appropriate  resource  usage  and 
apparent  resource  status,  and  report  this  information  to  MSHN’s  Resource  Requirements 
Database  (RRD)  and  Resource  Status  Server  (RSS).  How  the  wrapper  intercepts  a  system 
call  is  discussed  in  further  detail  in  the  following  section. 

C.  MONITORING  RESOURCES 

As  mentioned  above,  to  monitor  a  resource  in  MSHN,  the  MSHN  wrapper/CL  must 
intercept  certain  system  calls  prior  to  the  calls  reaching  the  OS.  •  How  this  interception  is 
done  is  not  important  for  this  thesis,  but  if  interested,  the  reader  is  referred  to  [SCHN98]. 
For  this  thesis,  we  need  only  understand  the  general  concept  of  how  a  system  call  is 
intercepted  and  what  happens  subsequently.  To  aid  in  explaining  the  process,  we  refer  to 
Figure  15,  below.  Figure  15  is  an  event  flow  diagram  fi-om  Schnaidt’s  thesis  [SCHN98] 
illustrating  what  happens  when  a  wrapped  application  invokes  a  read( )  system  call.  This 
type  of  interception  is  invoked  for  those  system  calls  that,  once  modified,  can  provide 
MSHN  with  resource  usage  or  status  measurements. 
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Figure  15:  Event  Flow  Diagram  for  read  ( )  From  [SCHN98] 

The  interception  process  is  as  follows.  The  application  makes  a  call  to  read  ( ) . 
Before  the  call  reaches  the  Operating  System  (OS),  the  MSHN  wrapper  intercepts  it.  The 
wrapper  then  looks  in  a  file  descriptor  table  (fdTable)  maintained  to  determine  the  type  of 
read  being  called.  The  wrapper  then  passes  the  read  ( )  to  the  OS.  The  OS  returns  the 
number  of  b5des  read  to  the  wrapper,  the  wrapper  then  updates  the  resource  monitor  and 
returns  the  size  of  the  read  () ,  along  with  the  data  read,  back  to  the  application.  This  is 
the  general  process  that  MSHN  uses  to  monitor  resource  usage  and  status.  The  code 
inserted  and  executed  between  the  application  and  the  read  ( )  system  call  is  part  of  the 
MSHN  Client  Library  (CL).  In  the  next  section,  we  will  look  at  the  output  of  the  prototype 
MSHN  CL,  and  discuss  some  enhancements  that  have  recently  been  made  to  the  CL  in 
order  to  provide  to  the  other  components  of  MSHN  more  fine-grained  information. 
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D.  CLIENT  LIBRARY  OUTPUT 


In  Schnaidt’s  thesis  [SCHN98],  MSHN  investigators  identified  the  following 
resource  usage  infonnation  as  being  important  (i.e.,  worth  recording)  in  aiding  the  MSHN 
Scheduling  Advisor  in  making  scheduling  decisions:  Local  File  I/O,  Network  File  I/O, 
Terminal  I/O,  Network  I/O,  and  statistics  concerning  Local  Inter  Process  Communication 
(IPC).  Resource  requirement  information  output  by  the  prototype  implementation  of  the 
MSHN  CL  is  shown  in  Figure  16.' 

<PROGTERMINATIONDATA> 

<APPWALLCLOCKRUNSEC>75.0117 

<SYSCPUSECS>1.04 

<USRCPUSECS>0.26 

<MAXRESSETSZ>0 

<UNSHAREDMEM>0 

<PAGEFAULTS>0 

<PHYSPAGENUM>192 

<VIRTUALPAGENUM>768 

<RESIDENTPAGENUM>620 


Figure  16:  Some  Resource  Requirement  Information  Output  by  Initial  Prototype  of 

MSHN  CL 


The  value  associated  with  SYSCPUSECS  in  Figure  16  above  shows  that,  over  the 
entire  75.0117  seconds  of  the  program’s  execution,  the  program  executed  in  kernel  mode 
for  only  1.04  seconds  of  CPU  time.  Similarly,  the  value  associated  with  USRCPUTIME 


'  The  initial  CL  prototype  also  aggregated  and  output  additional  information  concerning  the  end-to-end 
communication  resource  status,  but  that  information  is  ignored  for  simplicity  in  this  thesis. 
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indicates  that  the  program  executed  in  user  mode  for  only  0.26  seconds.  For  wise 
scheduling  decisions,  this  information  may  not  be  fine  grained  enough. 

In  order  to  get  the  fine  grain  information  required  for  this  thesis,  Shirley  Kidd,  a 
member  of  the  MSHN  staff,  modified  the  MSHN  wrapper.  The  current  version  of  the 
MSHN  CL  now  has  the  ability  to  report  information  about  individual  system  calls,  as  well 
as  information  concerning  the  program’s  behavior  between  system  calls.  The  MSHN  CL 
records  the  wall-clock  time,  as  well  as  the  user  CPU  and  system  CPU  times,  between  calls. 
Additionally  for  each  call,  the  CL  records  the  type  of  call,  the  arguments  passed  to  as  well 
as  returned  firom  the  call,  and  the  wall-clock  time,  and  system  and  user  CPU  time  spent  in 
that  call.  Example  output  fi'om  this  new  version  is  presented  in  Figure  17,  and  an 
illustration  of  the  meaning  of  this  data  is  shown  in  Figure  18. 
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System  SysCall  Between  Call  UsrCpu  SysQju 

—Call  Interval  Interval _ Interval _  Interval 


Figure  17:  Example  Output  From  the  MSHN  Client  Wrapper  (in  sec) 
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Figure  18:  Example  Execution  Timeline  of  a  Bison  Application 


Column  1  of  Figure  17  shows  every  wrapped  system  call  in  the  order  it  was  made  by  the 
application,  Bison.  Reading  across  the  rows  of  the  table,  one  can  see  how  time  was 
consumed  during  that  particular  system  call.  Figure  18  is  read  from  left  to  right,  and  top  to 
bottom.  It  is  a  continuous  timeline  of  the  program  Bison  during  execution.  We  took  the 
times  in  Figure  17  and  represented  them  in  a  timeline  like  fashion.  The  timeline,  though 
accurately  representing  the  data,  is  not  precise  in  its  fine  detail.  The  author  has  estimated 
this  fine  detail,  e.g.,  the  exact  location  of  the  system  CPU  Interval  in  Write_term_io.  The 
call-out  in  Figure  18,  for  example,  depicts  the  time  intervals  of  last  wrapped  system  call 
made.  Using  both  Figure  17  and  Figure  18  we  can  see  that,  of  the  total  0.000294089 
seconds  spent  in  the  Write_term_io  system  call,  some  very  small  portion  was  spent  in 
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user  and  system  CPU  space.  It  is  important  to  note  that  even  though  Figure  17  shows 
0.0000  seconds  for  several  of  the  user  CPU  and  system  CPU  intervals,  these  measurements 
are  not  correct.  These  intervals  were  so  small  that  the  timing  routines  used  did  not  have 
the  ability  to  measure  the  interval.  A  full  explanation  of  the  importance  of  this  data  can  be 
found  in  Chapter  VI. 

E.  SUMMARY 

In  this  chapter,  we  discussed  how  applications  being  scheduled  to  run  in  the  MSHN 
environment  are  wrapped.  We  discussed  how  the  CL  monitors  the  resources  an 
application  is  using,  and  finally,  we  discussed  the  contents  of  the  CL’s  output  and  some  of 
the  new  capabilities  recently  added  to  the  CL. 


53 


54 


VI.  EXPERIMENT 


This  chapter  describes  the  experiment  designed  to  determine  if  the  output  of  the 
statistical  applications  used  in  this  thesis  will  prove  useful  for  future  stochastic  scheduling 
algorithms.  Section  A  describes  (1)  how  the  statistical  applications  were  designed  and 
implemented  using  Java,  (2)  the  Bison  input  file  used  for  this  experiment,  and  (3)  how  we 
wrapped,  ran  and  extracted  the  appropriate  data  from  the  Bison  application.  Section  B 
describes  the  experiment’s  methodology,  and  how  and  why  the  experiment  was  run. 
Section  C  presents  the  results  of  the  experiment.  Section  D  describes  our  analysis  of  the 
results  and  our  conclusions.  Section  E  is  a  summary  of  this  chapter. 

A.  DESIGN  OF  THE  EXPERIMENT 

The  design  of  this  experiment  is  best  explained  in  two  phases:  Phase  1  describes 
the  wrapping  of  Bison,  and  the  pre-processing  of  MSHN  Client  Library  (CL)  output. 
Phase  2  details  the  design  and  implementation  of  the  statistical  package  developed  for  this 
thesis. 


1.  Wrapping  and  running  the  application  and  pre-processing  CL  output 

This  section  describes  Bison,  the  application  we  chose  to  wrap,  and  the  process 
used  to  wrap  it.  The  section  also  describes  the  pre-processing  done  to  the  MSHN  CL 
output  file  prior  to  the  execution  of  the  experiment. 

Bison  is  a  general-purpose  parser  generator  that  converts  a  formal  description  of  an 
LALR(l)  context-free  grammar  into  a  C  program  that  parses  that  grammar.  We  selected 
Bison  as  the  application  to  wrap  because  the  Bison  code  was  readily  available. 

Wrapping  Bison  was  fairly  simple.  There  are,  however,  a  few  comments  worth 
noting.  First,  a  full  tutorial  on  wrapping  system  calls  and  wrapping  applications  is 
available  in  Schnaidt’s  thesis  [SCHN98].  This  tutorial  should  be  completed  in  its  entirety 
before  attempting  to  wrap  any  executable.  The  tutorial  first  explains  the  process  of 
wrapping  a  system  call  and  then  explains  wrapping  an  application.  Wrapping  a  system  call 
and  wrapping  an  application  are  very  similar  tasks;  as  such,  it  is  easy  to  confuse  the  two. 
The  second  note  worth  mentioning  is  that  the  current  version  of  the  wrapper  only 
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intercepts  certain  system  calls  (i.e.,  only  the  system  calls  that  have  been  wrapped  by  the 
MSHN  staff).  If  a  user  wants  to  get  information  about  a  certain  system  call,  that  call  must 
first  be  wrapped.  For  example,  if  the  user  wants  to  obtain  usage  information  for  the  system 
calls  fork  ( )  and  exec  ( )  ,  these  calls  have  to  first  be  wrapped.  My  final  observation 
about  wrapping  an  application  is  in  reference  to  the  many  libraries  that  a  typical 
application  must  link  with.  Often  errors  will  appear  after  all  of  the  steps  for  wrapping  an 
application  have  been  completed  and  the  make  command  is  invoked.  In  the  author’s 
experience,  these  errors  occxir  because  some  of  the  necessary  libraries  needed  for  linking 
are  missing.  To  fix  the  errors,  simply  determine  which  libraries  are  missing  and  then  add 
them  to  the  linking  path  in  the  application’s  make  file. 

To  ensure  the  wrapper  did  not  change  the  output  of  the  Bison  application,  we 
compared  the  resulting  program  generated  by  an  input  grammar,  written  by  Gary  Stone  (a 
Ph.D.  student  at  the  Naval  Postgraduate  School),  prior  to  wrapping  Bison  with  one 
generated  by  a  wrapped  version  of  Bison.  The  resulting  output  files  were  identical.  The 
only  difference  between  the  two  runs  was  that  the  execution  times  were  significantly 
different.  The  wrapped  version  of  Bison  took  approximately  24  times  longer  than  the 
unwrapped  version,  5  seconds  for  the  tinwrapped  version  and  120  seconds  for  the  wrapped 
version.  The  added  overhead  comes  from  additional  code  added  to  the  wrapper  to  output 
the  collected  fine-grained  data  to  the  terminal.  The  added  overhead  would  not  occur  under 
normal  use  of  the  MSHN  CL  as  the  code  would  be  optimized  and  the  output  data  cached. 

After  determining  that  the  wrapped  Bison  application  was  working  properly  we 
obtained  a  much  larger  grammar  file  to  use  in  our  test.  The  grammar  file  chosen  was  the 
parse .  y  file  found  in  GCC.  We  chose  this  grammar  file  because  it  produces  much  more 
data  than  that  used  in  the  initial  comparison  test.  This  particular  file  ran  for  635.58 
seconds  and  produced  the  output  found  in  Appendix  A.  While  all  of  the  data  in  Appendix 
A  could  be  useful,  we  are  only  interested  in  the  “Between  System  Call”  and  “System  Call” 
intervals  that  produced  over  50  data  elements. 

Rather  than  sifting  through  the  MSHN  CL  and  manually  cormting  and  extracting 
the  intervals  that  had  over  100  entries  Ramesh  Mantri,  a  programmer  on  the  MSHN  staff, 
wrote  a  Perl  script  that  parsed  out  the  numerical  data  needed.  A  sample  of  this  data  is 
shown  below  in  Figure  19. 
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0.000476003 

0.000561953 

0.000468016 

0.00049901 

0.000468016 

0.000550985 

0.000764966 

0.000491023 

0.000452042 

0.00051105 

0.000433922 

0.000495911 

0.000429034 

0.000586987 

0.000467896 

0.000584006 


Figure  19:  Example  of  Data  Elements  Extracted  From  MSHN  CL  Using  Perlscript 

The  data  elements  in  Figure  19  are  now  in  a  form  useable  by  the  applications 
designed  for  our  tests  in  this  thesis. 

2.  Designing  and  implementing  the  statistics  application 

This  section  describes  (1)  the  statistical  packages  used  in  this  thesis’  experiments^ 
(2)  the  implemented  the  packages,  and  (3)  their  validation. 

As  discussed  in  previous  chapters,  one  of  the  goals  of  this  thesis  is  to  make  the 
MSHN  CL  data  more  useful  for  other  MSHN  components.  To  accomplish  this,  we 
aggregate  and  describe  this  data  using  statistics. 
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Java  was  chosen  as  the  language  in  which  to  write  our  applications,  because  it  is  an 
object-oriented  language  and  has  many  built-in  classes  useful  to  us.  The  overall  goal  was 
to  first  build  an  object  that  contained  an  array  of  data  elements  as  well  as  statistics 
describing  that  data.  That  object  could  then  be  passed  to  any  or  all  of  the  three-distribution 
tests  written  for  this  thesis. 

Our  experiment  needed  to  obtain  the  following  descriptive  statistics  for  our  data: 

•  Sum 

•  Mean 

•  Median 

•  Variance 

•  Standard  deviation 

•  Largest  data  element 

•  Smallest  data  element 

•  Range 

The  above  statistics  are  used  in  at  least  one  of  our  distribution  tests.  As  stated  above,  an 
object  is  created  containing  an  array  of  data  elements  along  with  all  of  the  above  statistics. 
The  data  from  the  file  generated  by  the  wrapped  CL  fills  the  array.  In  subsequent  versions 
our  goal  is  to  read  the  data  from  a  stream.  Next,  the  Java  application  invokes  a  method  to 
sort  the  data  elements.  Finally,  the  application  calls  the  appropriate  statistical  method.  The 
code  for  creating  the  data  object  is  in  Appendix  A. 

Next,  we  built  two  distributions  tests,  the  Komogorov-Smirinov  (K-S)  and  Anderson- 
Darling  (A-D)  “goodness  of  fit”  tests  and  a  Dirichilet  process.  Chapter  IV  explains  these 
applications  in  detail.  Each  of  the  tests/process  takes  the  above  data  object  as  input. 

The  “goodness  of  fit”  tests  are  simple  hypothesis  tests.  In  both  the  K-S  and  the  A-D 
tests  for  this  thesis,  the  null  hypothesis  is  either  that  the  data  comes  from  a  population  with 
an  imderlying  normal  distribution  or  that  the  data  comes  from  a  population  with  an 
underl)dng  exponential  distribution.  In  each  case,  a  normal  and  exponential  statistic  is 
computed,  a  numerical  adjustment  is  made  to  the  statistic  and  the  result  is  compared  to  a 
value  in  a  look-up  table.  Depending  on  where  the  adjusted  statistic  falls  in  the  table,  the 
null  hypothesis  is  either  accepted  or  rejected  at  a  given  level  of  significance. 
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The  K-S  and  A-D  “goodness  of  fit”  tests  were  built  using  the  equations  and  tables  in 
[LAWK91].  We  chose  those  particular  methods  because  they  (Law  &  Kelton)  reduced  the 
size  of  the  look-up  tables  with  which  the  adjusted  statistic  is  compared  while  saving  a 
significant  amount  of  time  and  cost  nothing  in  accuracy.  Both  of  the  “goodness  of  fit”  test 
applications  were  built  in  the  same  manner.  Each  test  takes  as  input  a  data  object.  The 
data  elements  of  the  object  are  then  used  to  compute  the  observed  cumulative  distribution 
function  (cdf).  We  then  look  for  the  largest  difference  between  the  observed  cdf  and  a 
hypothesized  cdf  This  difference  then  becomes  our  statistic.  Before  we  can  compare  this 
statistic  to  the  values  in  our  table,  we  must  make  the  appropriate  mathematical  adjustment 
to  the  statistic  according  to  the  formulas  (see  [LAWK91]).  The  adjustment  is  based  on 
form  of  the  hypothesized  distribution.  Comparing  the  computed  result  with  the  values  in 
the  table  allow  us  to  accept  or  reject  the  null  hypothesis.  The  source  code  for  these 
applications  is  in  Appendix  A. 

In  addition  to  the  “goodness  of  fit”  tests,  we  also  built  an  application  based  upon  a 
process  that  converges  to  the  actual  distribution  of  the  data.  The  method,  as  discussed  in 
Chapter  IV,  is  called  the  Dirichlet  Process.  The  source  code  for  this  method  is  also  in 
Appendix  A.  Presently,  the  Dirichlet  Process  built  for  this  thesis  can  only  determine  the 
distribution  for  one  data  element  at  a  time.  Future  work  is  needed  to  add  a  loop  to  compute 
the  distribution  for  every  data  element  in  a  particular  set  of  data. 


B.  EXPERIMENT  METHODOLOGY 

The  experiment  for  this  thesis  was  designed  to  show  that  the  above  statistical  tests 
and  processes  can  provide  useful  information  to  the  MSHN  components.  By  useful,  we 
mean  that  we  can  either  accept  or  reject  that  the  underlying  distribution  is  from  a  particular 
family  of  distributions.  Whether  the  tests  accepted  or  rejected  the  null  hypothesis,  i.e.,  that 
the  data  is  from  a  particular  distribution,  that  information  can  be  provided  to  and 
potentially  improve  the  performance  of  subsequent  (stochastic)  scheduling  algorithms.  In 
addition  if  the  tests  reject  the  null  hypothesis,  those  distributions  can  be  eliminate  in  the 
RSS  as  being  associated  with  the  resource/data. 
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For  this  experiment,  Bison  is  wrapped  using  the  MSHN  CL  and  a  parse.y  file, 
extracted  from  GNU  C  Compiler  (GCC)  is  run.  When  the  Bison  application  finishes,  the 
MSHN  CL  output  is  run  through  our  Perl  program  to  extract  the  needed  data  elements. 
Once  the  data  elements  are  in  a  file,  data  objects  are  created  and  there  elements  tested  to 
see  if  those  elements  come  from  a  population  with  an  underlying  normal  distribution,  an 
underlying  exponential  distribution,  or  neither. 

C.  RESULTS 

Upon  completion  of  execution  of  our  wrapped  version  of  Bison  and  running  our 
Perl  script  program  the  MSHN  CL  provided  us  with  the  data  elements  found  in  Appendix 
B.  For  each  of  the  six  data  sets  we  created  data  objects  and  computed  the  descriptive 
statistics.  The  descriptive  statistics  for  each  data  set  are  located  in  Appendix  B 
immediately  after  its  data  set.  Using  the  descriptive  statistics  and  the  data  set  as  input  we 
ran  the  two  goodness-of-fit  tests.  The  results  of  these  tests  are  in  Appendix  C. 

Tables  6,  7  and  8  below  show  the  data,  the  descriptive  statistics  of  the  data 
(computed  using  our  statistical  methods),  and  the  “goodness  of  fit”  test  results  of 
(computed  using  the  “goodness  of  fit”  tests).  Figure  20  shows  the  descriptive  statistics  and 
a  histogram  of  the  data  computed  using  MINITAB^. 


^  MINETAB®  is  a  statistical  software  package  that  was  developed  over  25  years  ago  to  make  data  analysis 
easier.  The  version  used  in  this  thesis  is  version  12,  ©1998 
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0.0001320 

0.0001349 

0.0001349 

0.0001330 

0.0001360 

0.0001321 

0.0001309 

0.0001370 

0.0001351 

0.0001320 

0.0001320 

0.0001320 

0.0001340 

0.0001310 

0.0001321 

0.0001329 

0.0001340 

0.0001321 

0.0001320 

0.0001301 

0.0001330 

0.0001320 

0.0001329 

0.0001400 

0.0001340 

0.0001321 

0.0001320 

0.0001310 

0.0001370 

0.0001330 

0.0001310 

0.0001340 

0.0001321 

0.0001321 

0.0001340 

0.0001330 

0.0001340 

0.0001330 

0.0001961 

0.0001329 

0.0001310 

0.0001329 

0.0001329 

0.0001409 

0.0001330 

0.0001330 

0.0001340 

Table  6  :  “Write  Remote  File  Between  Call  Interval”  Data  Captured  using  the  MSHN 

Client  Library 


The  Data  object  has  47  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =  1.30057E-4 
Maximum  =  1.96099E-4 
Range  =  6.604200000000002E-5 
Sum  =  0.006326914999999999 
Mean  =  1.3461521276595742E-4 
median  =  1.32918E-4 
Variance  =  8.848993495374656E-1 1 
Std  Dev  =  9.406908894729796E-6 
Skewness  =  6.334653681690928 
Kurtosis  =  42.01 1280023941055 

95%  Cl  =  2.7184671756673306E-6  conducted  with  sample  StdDev  NOT  population  StdDev 


Table  7:  Descriptive  Statistics  for  WRFBCI  Data 
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Write  Remote  File  Between  Call  Interval”  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  K-S  test  statistic  2.48335644275209026  is  greater  than  the  modified  critical  value 

1.035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  K-S  test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  imderlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  4.42238394581 1 191  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 

TEST:  A-D  test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  1 1 .454589644697467  is  greater  than  the  modified  critical  value 

1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  A-D  test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  20.68485876216289  is  greater  than  the  modified  critical  value 
1.943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 

Table  8:  ”  Goodness  Of  Fit  ”  Test  Results  using  Our  ‘^Goodness  Of  Fit”  Application 
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Descriptive  Statistics 


26%  Confidence  Intery-alforMu 


I  i  11  I  .  ■  r 


95%  Confidence  Inlen/al  for  Median 


Variable:  wrfbci 


Anderson-Darling  Normalit/Test 


A-Squared: 

11.011 

P-Value: 

0.000 

Mean 

1.35E-04 

StDev 

9.41  E-06 

Variance 

8.85E-11 

Skewness 

6.33475 

Kurtosis 

42.0100 

N 

47 

Minimum 

1.30E-04 

1  St  Quartile 

1.32E-04 

Median 

1.33E-04 

3rd  Quartile 

1.34E-04 

Maximum 

1.96E-04 

95%  Confidence 

Interval  for  Mu 

1.32E-04 

1.37E-04 

95%  Confidence  Interval  for  Sigma 
7.82E-06  1.18E-05 


95%  Confidence  Interval  for  Median 
i.32E-04  1.33E-04 


Figure  20:  Descriptive  Statistics  and  Histogram  of  WRFBCI  Data  Computed  using 

Minitab 


The  data  in  Table  6  above  is  one  of  six  different  sets  of  data  obtained  from  the 
MSHN  Client  Library  (CL)  after  running  a  wrapped  version  of  Bison.  Each  of  the  47  data 
elements  in  Table  6  represent  the  time  interval  between  each  “write  to  a  remote  file” 
system  call.  The  set  of  data  in  Table  6  is  only  a  representative  sample  and  is  presented  for 
discussion  simply  because  the  number  of  data  elements  are  neither  too  large  nor  too  small. 

The  descriptive  statistics  generated  by  this  thesis’  statistical  applications  are 
presented  in  Table  7.  In  MSHN,  these  statistics  will  be  stored  in  the  Resource 
Requirements  Database  (RRD)  for  possible  use  by  other  MSHN  components  such  as  the 
MSHN  Scheduling  Advisor  (SA).  Future  work  must  be  done  to  update  these  statistics  as 
soon  as  new  data  arrives. 

With  the  information  in  Table  7,  the  “goodness  of  fit”  tests  can  now  be  run  to  fit 
the  data  to  a  distribution.  The  results  of  our  “goodness  of  fit”  test  are  presented  in  Table  8. 
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As  discussed  in  Chapter  IV,  we  attempt  to  “fit”  our  data  to  two  distributions,  the  normal 
distribution  and  the  exponential  distributions.  In  addition,  we  built  two  “goodness  of  fit” 
tests  to  check  for  each  type  of  distribution. 

In  this  experiment,  both  the  Kolmogorov-Smimov  (K-S)  and  the  Anderson-Darling 
(A-D)  tests  rejected  the  null  hypothesis  in  favor  of  the  alternative.  The  computed  statistic 
for  the  K-S  test  is  2.4833  and  the  critical  value,  the  value  that  we  are  comparing  against,  is 
1.035.  Clearly,  2.4833  is  larger  than  1.035  and,  therefore,  the  null  hypothesis  is  rejected. 
The  computed  Anderson-Darling  (A-D)  statistic  is  11.4545  and  the  critical  value  is  1.029. 
Here  too  the  statistic  is  larger  than  the  critical  value  and  the  null  hypothesis  is  rejected  in 
favor  of  the  alternative.  Both  tests  rejected  the  null  hypothesis  at  the  largest  possible 
Confidence  Interval  (Cl),  99%. 

The  outcome  is  similar  when  testing  the  “fit”  for  the  exponential  distribution.  The 
computed  K-S  statistic  for  the  exponential  distribution  for  this  experiment  is  4.4223.  The 
critical  value  using  a  99%  Cl  is  1.308.  Again,  the  computed  statistic,  4.4223,  is  larger  than 
the  critical  value,  1.308,  and  the  null  hypothesis  is  rejected  in  favor  of  the  alternative.  In 
the  case  of  the  A-D  test,  the  computed  A-D  statistic  for  the  data,  20.68,  is  significantly 
larger  than  the  critical  value,  1.943,  and  the  null  hypothesis  is  rejected  in  favor  of  the 
alternative.  Like  the  tests  for  normality,  these  tests  failed  at  the  99%  CL 

Figure  20  shows  Minitab’s  descriptive  statistics  and  histogram  of  our  “write  remote 
file  between  call  interval”  data.  For  all  intent  and  purposes,  the  descriptive  statistics 
computed  by  Minitab  and  computed  using  our  application  are  identical,  and  thus,  validate 
our  descriptive  statistics  application.  The  histogram  is  presented  to  let  the  reader  see  that 
our  “goodness  of  fit”  tests  are  accurate  in  rejecting  the  null  hypothesis.  In  Figure  20, 
Minitab  also  reports  an  A-D  statistic  for  a  normal  distribution.  Therefore,  we  can  compare 
our  adjusted  A-D  statistic  for  a  normal  distribution,  11.4545,  with  Minitabs  actual  A-D 
statistic  distribution,  11.011.  The  two  statistics  are  relatively  close.  The  reason  for  the 
difference  is  that  the  method  we  used  actually  computes  an  adjusted  A-D  statistic.  The 
adjustment  to  the  A-D  statistic  allows  us  to  use  a  modified  set  of  critical  values 
([LAWK91].  Had  we  not  made  any  adjustments  to  the  A-D  statistics  in  our  method  the 
two  statistics  would  have  been  exact,  but  we  would  have  then  had  to  use  a  complete  A-D 
critical  values  table. 
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Appendices  A-D  show  the  results  of  the  experiments  performed  on  each  of  the  six 
sets  of  data.  In  every  instance,  the  null  hypothesis  was  rejected  in  favor  of  the  alternative. 
Therefore,  our  data  does  not  come  from  either  a  population  with  an  underlying  normal 
distribution  or  a  population  with  an  exponential  distribution.  In  fact,  a  couple  of  our  data 
sets  when  plotted  appear  to  have  more  than  one  mode.  Upon  further  thought,  the  nature  of 
the  intervals  measured,  i.e.,  “Read  Local  File  System  Call  Interval”,  “Write  Remote  File 
Between  Call  Interval”, . . .  make  them  more  likely  to  be  multi-modal. 

For  example,  the  “Write  Remote  File  Between  Call  Interval”  does  not  pay  any 
attention  to  what  code  is  executing  between  the  calls.  A  typical  program  will  likely 
perform  “Write  Remote  File  Calls”  for  several  different  purposes  and  at  several  different 
locations  in  the  code.  The  most  frequent  calls  might  be  for  writing  data  records.  Less 
frequent  calls  might  be  for  the  attachment  of  header  and  trailer  information  to  the  data 
packet.  The  length  of  the  interval  between  calls  would  differ  depending  upon  the  purpose 
of  the  code  within  which  the  calls  are  made,  leading  to  multi-modal  distribution  for  the 
“Between  Call  Interval”.  Additionally,  multi-modal  distributions  could  also  result  from 
relatively  infrequent  interrupts  from  Operating  System  (OS)  maintenance  calls  causing  the 
program  to  be  swapped  out  (e.g.,  from  file  system  maintenance  routines). 

D.  CONCLUSIONS 

Our  experiments  have  shown  that  we  can  compute  descriptive  statistics  of  the  data 
we  receive  from  the  MSHN  CL  and  we  can  fit  that  data  to  any  number  of  distributions. 
Other  distributions  can  be  “fit”  by  simply  building  the  test  as  we  did  for  the  normal  and 
exponential  distributions.  For  the  data  sets  studied  in  our  experiments,  our  algorithms 
conclude  that  our  data  elements  do  not  come  from  a  population  with  either  an  underlying 
normal  or  an  imderlying  exponential  distribution. 

We  have  discussed  and  partially  built  a  Dirichlet  process.  The  Dirichlet  process 
computes  an  approximate  distribution  that  converges  to  the  actual  distribution  of  a  set  of 
data.  With  more  work,  the  Dirichlet  process  could  be  implemented  into  MSHN. 

We  now  have  the  ability  to  store  all  of  the  descriptive  statistics,  the  K-S  and 
Anderson-Darling  statistics  in  the  Resource  Requirements  Database.  The  next  step  is  to 
update  these  statistics  as  new  data  becomes  available. 
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E.  SUMMARY 


In  this  chapter,  we  described  the  design  of  our  experiment.  We  discussed  how  the 
application  for  calculating  the  descriptive  statistics  and  the  distribution  tests  work  and  were 
constructed.  We  presented  and  analyzed  the  results  of  the  experiment  and  finally  presented 
our  conclusions. 
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VII.  SUMMARY  AND  FUTURE  WORK 


A.  SUMMARY 

This  thesis  is  broken  down  in  to  seven  chapters.  In  the  introduction  we  discussed 
Smart  Net,  the  first  scheduling  fi'amework  for  heterogeneous  computing  that  considered, 
when  scheduling  jobs  in  its  environment,  both  a  jobs  affinity  for  running  on  particular 
machines  and  the  availability  of  machines.  We  introduced  the  Management  System  for 
Heterogeneous  Network  (MSHN),  an  improved  management  system  designed  to  do 
research  into  how  to  provide  all  users  with  optimal  Quality  of  Service.  We  discussed  the 
scope  of  this  thesis,  its  major  contributions,  and  its  organization. 

In  Chapter  II,  we  discussed  MSHN  in  great  detail.  We  covered  its  evolution,  goals, 
components,  functionality  and  why  we  believe  distribution  statistics  can  be  useful  to 
MSHN.  We  focused  primarily  on  the  MSHN  Client  Library  (CL)  because  it  generated  the 
data  we  needed  for  our  experiments.  We  discussed  why  using  only  the  mean  in  scheduling 
decisions  is  inadequate.  And  we  posed  our  hypothesis:  that  we  can  provide  more  useful 
information  than  just  the  mean  to  future  stochastic  scheduling  algorithms. 

Chapter  III  covered  previous  works  performed  in  the  areas  of  resource  monitoring 
and  Stochastic  Scheduling  Algorithms.  We  looked  at  six  different  Resource  Management 
Systems  (RMSs)  and  tools,  and  discussed  how  they  conduct  resource  monitoring.  We 
discussed  some  of  the  pros  and  cons  of  the  methods  those  RMSs  and  tools  choose  to  use. 
We  then  discussed  briefly  Stochastic  Scheduling  Algorithms.  These  types  of  algorithms, 
or  future  algorithms  like  them,  will  make  use  of  the  information  that  we  provide  in  this 
thesis. 

Chapter  IV  described  the  formal  definition  of  our  problem  and  the  approaches  that 
we  have  chosen  to  solve  this  problem.  We  discussed  the  need  for  more  useful  information 
about  the  data  that  we  get  from  the  MSHN  CL,  and  how  that  more  useful  information  can 
be  provided  using  statistics.  We  described  why  a  set  of  data’s  underlying  distribution  can 
lead  to  more  accurate  scheduling  when  used  in  combination  with  Stochastic  Scheduling 
Algorithms  currently  being  developed.  We  then  described  two  “goodness  of  fit  tests,”  the 
Kolmogorov-Smimov  and  the  Anderson-Darling  tests,  chosen  to  see  if  a  particular  type  of 
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data  obtained  from  the  MSHN  CL  is  either  normal  or  exponential.  We  point  out  that  due 
to  time  constraints,  these  distributions  are  the  only  two  we  tested.  With  more  time  and 
research,  several  other  tests  could  be  implemented,  making  the  MSHN  scheduler  even 
more  effective.  While  these  tests  only  confirm  or  deny  that  a  particular  distribution  is 
associated  with  a  set  of  data,  our  third  approach  eventually  converges  to  the  actual 
distribution.  This  approach  is  called  the  Dirichlet  Process.  Each  of  these  three  approaches 
is  explained  in  great  detail  along  with  short  examples  for  clarity. 

Chapter  V  is  a  complete  discussion  on  how  the  MSHN  CL  monitors  resources  in 
the  MSHN  environment.  The  first  part  of  this  chapter  described  how  to  wrap  an 
application  and  gives  an  example  of  how  we  wrapped  BISON.  The  second  part  of  the 
chapter  covered  how  MSHN  intercepts  and  uses  system  calls  to  monitor  resources. 
Finally,  we  provide  output  from  the  MSHN  CL  for  both  a  wrapped  and  unwrapped 
application  for  comparison. 

Chapter  VI  is  our  thesis  experiment.  This  chapter  described  how  we  built  the 
methods  used  to  compute  a  data  set’s  descriptive  statistics.  We  discussed  the  design  of  the 
experiment  and  our  experiment  methodology.  And  finally,  we  presented  our  results  and 
stated  our  conclusions. 

Our  final  chapter.  Chapter  VII,  siunmarizes  this  thesis  and  discusses  possible  future 

work. 

B.  FUTURE  WORK 

There  are  many  possibihties  for  fuUire  work  related  to  this  thesis.  Listed  below  are 
five  areas  that  would  be  beneficial  to  both  MSHN  and  other  resource  management  systems 
in  distributed  environments. 

1.  Stochastic  Scheduling  Algorithms  -  While  we  know  these  algorithms  exist  and  are  a 
very  active  research  area,  we  do  not  know  the  important  details  of  how  they  work. 
Research  needs  to  be  conducted  on  what  t)T)e  of  input  such  algorithms  take.  Once  we 
know  the  input  for  such  algorithms,  the  work  done  in  this  thesis  can  be  modified  and 
improved. 
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2.  Implementing  the  code  in  this  thesis  in  MSHN  -  The  code  used  in  this  thesis  was 
designed  specifically  to  conduct  our  experiments.  This  code  must  be  extended  and 
made  to  interface  with  the  MSHN  CL,  the  MSHN  RRD  and  the  MSHN  RSS. 

3.  Windowing  mechanism  for  data  -  The  code  must  also  be  made  to  read  streams  of  data 
as  opposed  to  reading  data  fi'om  a  file.  In  addition  to  reading  streams  of  data,  a 
windowing  mechanism  must  be  built.  A  windowing  mechanism  prevents  early  data 
from  anchoring  more  current  data.  Anchoring  data  is  an  important  concept  because  it 
could  possibly  lead  to  wrong  estimates  of  the  underlying  distributions. 

4.  Implementing  tests  for  other  distributions  -  This  thesis  tested  for  only  two  types  of 
distributions,  the  normal  and  the  exponential.  Many  other  distributions  exist  and 
should  be  tested  for.  For  example,  the  log-normal  and  Weibull  distributions  could  be 
tested. 

5.  Multi-modal  distributions  -  As  it  turned  out,  some  of  the  distributions  found  in  our 
experiments  showed  signs  of  being  multi-modal.  We  must  have  some  method  of 
dealing  with  such  distributions.  Perhaps  convolving  or  combining  distributions  would 
be  possible. 
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APPENDIX  A:  CODE  FOR  DATA  OBJECT  AND  STATISTICAL  TESTS 


DataObject 

/** 

*  DataObject 

*  ®  Author  MAJ  Tom  Cook 

*  ®  Version  JDK  1.2 

*  This  program  creates  a  Data  Object.  The  Data  Object  consists  of  an  array  of 
data  and  descriprive 

*  statistics  of  that  data.  The  descriptive  statistics  are:  sum,  min,  max, 
range,  mean,  variance, 

*  standard  deviation,  skewness,  kurtosis,  95%  Cl  of  mean,  and  the  number  of 
data  elements  in  the 

*  array 

* 

*/ 

// _ 


import  j  ava . io . * ; 
import  j ava. util.*; 

// _ 


public  class  DataObject 

{ 

//  VARIABLE  DECLARATIONS  AND  CONSTRUCTOR 

// _ 


protected  double  data[]; 

protected  double  sum  =  0.0; 

protected  double  variance  =  0.0; 

protected  double  mean  =  0.0; 

protected  double  median  =  0.0; 

//protected  double  mode  =  0.0; 

protected  double  stdDev  =  0.0; 

protected  double  min  =  0.0; 

protected  double  max  =  0..  0; 

protected  double  range  =  0.0; 

protected  double  skewness  =  0.0; 

protected  double  kurtosis  =  0.0; 

flat 

protected  double  confidence95  =  0.0; 

protected  int  numDataElem  =  0; 

protected  long  timeObjBuilt ;  //  ’ 

protected  Calendar  rightNow; 


//  array  to  hold  data 
//  sum  of  data  in  array 
//  variance  of  data  in  array 
//  mean  of  the  data  in  the  array 
//  median  of  the  data  in  the  array 
//  mode  of  the  data  in  the  array 
//  StdDev  of  the  data  in  the  array 
//  minimum  of  data  in  array 
//  minimum  of  data  in  array 
//  range  of  data  in  array 
//  skewness  of  data 

//  kurtosis  of  data  POS  =  peak,  NEC  = 

//  95%  Cl  of  the  sample  mean 
//  minimum  of  data  in  array 
le  the  object  was  created 


//  constructor  to  create  a  dataObject  object 
public  DataObject 0 
{ 
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//  data  from  file  read  as  a 


String  dataFromFile; 
string 

double  dataElement;  //  convert  data  element  to  a 

double 

boolean  EOF  =  false;  //  End  Of  File 

data  =  new.  double  [300] ;  //  new  300  element  array 

named  data 

try 

{ 

String  dirName  =  "c : /data/Parse^yData" ;  //  Directory  path 

String  fileName  =  "wrfbci.txt”;  //  File  Name 

File  MyData  =  new  File  (dirName,  fileName); 

Buf feredReader  Datain  =  new  Buf feredReader ( 

new  Fil eReader (MyData) ) ; 


while  (!EOF) 

{ 


try 

{ 


dataFromFile  =  Datain. readLine () ; 
dataElement  =  Double .parseDouble (dataFromFile) ; 

//  if  the  array  is  full  make  another  twice  as  big  and  copy  the  old  into 

the  new 

if  (numDataElem  ==  data. length) 

{ 

double  newData[]  =  new  double  [data. length  *  2] ; 
for  (int  idx  =  0;  idx  <  data. length;  idx++) 
newData  [idx]  =  data [idx]; 
data  =  newData; 

} 

//  fill  the  array 

data  [numDataElem]  =  dataElement; 

numDataElem++ ;' 

} 


//  catch  for  an  end  of  file  exception 
catch  (EOFException  e) 

{ 

System. out .println  ("  In  EOFException  ”)  ; 
EOF  =  true; 

} 

//catch  for  a  null  pointer  exception 
catch  (NullPointerException  e) 

{ 

EOF  =  true; 

} 


} 
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//  time  object  was  built 

//  sort  the  array  in  ascending  order 
Arrays. sort  (data,  0 , numDataElem)  ; 

//  Get  the  descriptive  statistics  of  the  data 
//  System. out .print In  are  debug  statements 

min  =  min() ; 

//System,  out. println(  "The  minimum  of  the  data  is  ”  +  min)  ; 
max  =  maxO  ; 

//System,  out  .print  In  ("The  maximum  of  the  data  is  "  +  max)  ; 
range  =  range ( ) ; 

//System. out .println("The  range  of  the  data  is  "  +  range); 

sum  =  sum();  //  Compute  the  sum  of  the  data 

//System,  out  .println(  "The  sum  of  the  data  is  "  +  sum)  ; 

mean  =  mean  ( )  ;  //  Compute  the  mean  of  the  data 

//System. out  .print In  ("The  mean  of  the  data  is  "  +  mean)  ; 

median  =  medianO;  //  Compute  the  median  of  the 

data 

//System,  out  .println(  "The  median  of  the  data  is  "  +  median); 

//mode  =  modeO;  //  Compute  the  mode  of  the  data 

//System,  out  .print  In  ("The  mode  of  the  data  is  ”  +  mode)  ; 

variance  =  variance  ();  //  Compute  the  variance  of  the  data 

//System,  out  .print  In  ("The  variance  of  the  data  is  "  +  variance); 
stdDev  =  stdDevO;  //  Compute  the  standard  deviation 

//System,  out .  print  In  ("The  standard  deviation  of  the  data  is  "  +  stdDev); 
skewness  =  skewness ();  //  Compute  skewness  of  data 

//System. out  .println( "The  skewness  of  the  data  is  "  +  skew)  ; 
kurtosis  =  kurtosisO;  //  Compute  kurtosis  of  data 

//System.out  .printlnC'The  kurtosis  of  the  data  is  "  +  kurtosis)  ; 
confidence95  =  conf  idence95  ( )  ;  //  Compute  95%  Cl  of  mean 

//System.out . printlnC’The  95%  Cl  of  mean  of  the  data  is  "  +  confidence95)  ; 

//  Method  for  printing  the  Data  Object 
/ /  PrintDataOb j ( ) ; 


/ /  Close  input  stream 
Dataln. close ( ) ; 


} 

//  catch  for  stream  creation  exception 
catch  (FileNotFoundException  e) 

{ 

System. err .println (e) ; 
return; 

} 

//  catch  for  file  read  exception 
catch  (lOException  e) 

{ 

System,  err  .println  (  "Error  reading  file  "  +  e  ); 
return; 

} 


return; 
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}  //  end  DataObjectO 


// _ 

/** 

*  get  the  ininiTnum  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ 

public  double  min() 

{ 

double  minimum  =  999999999.9; 
double  tempMin  =  0.0; 

for(int  i  =  0;  i  <  numDataElem;  ++i) 

{ 

tempMin  -  data[i]; 
if  (tempMin  <=  minimum) 

{ 

minimum  =  tempMin; 

} 

} 

return  minimum; 

}  //  end  min 

// _ 

^  * 

*  get  the  maximum  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ 

public  double  max() 

{ 

double  maximum  =  -999999999.9; 
double  tempMax=  0.0; 

for(int  i  =  0;  i  <  numDataElem;  ++i) 

{ 

tempMax  =  data[i]; 
if  (tempMax  >=  maximum) 

{ 

maximum  =  tempMax; 

} 

} 

return  maximum; 

}  //  end  max 

// _ 

/  ** 

*  get  the  range  of  the  data  in  the  Array 

*  @  Author  MAJ  Tom  Cook 
*/ 

// _ 

public  double  range () 

{ 

return  Math.abs  (max()  -  minO); 

}  //  end  max 

// _ 

/** 
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*  compute  the  sum  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ _ 

public  double  sumO 

{ 

double  sumData  =  0; 

for(int  i  =  0/  i  <  numDataElem;  ++i) 

{ 

sumData  +=  data[i]; 

} 

return  sumData; 

}  //  end  mean 

//}  //  end  Class  DataObject 

// _ 

/  ★  ★ 

*  compute  the  mean  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 

V 

// _ _ 

public  double  meanO 

{ 

double  sumData  =  sum{) ; 
return  sumData/numDataElem; 

}  //  end  mean 

// _ _ _ 

/** 

*  compute  the  median  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ 

public  double  median () 

{ 

if  (numDataElem  <=  1) 

{ 

return  data [0] ; 

} 

if  (numDataElem%2  ==  0) 

{ 

double  answer  =  0.0; 

answer  =  (data [ (numDataElem/2)  -  1]  +  data [numDataElem/2]  ) /2  ; 

return  answer; 

} 

else 

return  data [ (numDataElem/2) -1]  ; 

}  //  end  median 

// _ 

/** 
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*  compute  the  mode  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ 

//  public  double  modeO 

//  { 


//  return  count; 

//  }  //  mode 

// _ 

/** 

*  compute  the  variance  of  the  data  in  the  Array 

*  @  Author  MAJ  Tom  Cook 

V 

// _ 


public  double  variance () 

{ 

double  sutnXiMinusMeanSqrd  =  0.0; 

for(int  i  =  0;  i  <  numDataElem;  ++i) 

{ 

sumXiMinusMeanSqrd  +=  Math.pow(data [i]  -  mean 0,2); 

} 

return  1 . 0/ (numDataElem- 1 )  *  sumXiMinusMeanSqrd; 

}  //  end  variance - 

// _ 

/** 

*  compute  the  stdDev  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ .  _ 


public  double  stdDev () 

{ 

double  variance  =  variance!); 
return  Math . sqrt (variance) ; 

}  //  end  StdDev 

// _ 

/** 

*  compute  the  skevmess  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

//_ _ 

public  double  skewness () 

{ 

double  sumXiMinusMeanDivByStdDevCubed  =  0.0; 

for(int  i  =  0;  i  <  numDataElem;  ++i) 

{ 

SumXiMinusMeanDivByStdDevCubed  +=  Math. pow (( (data  [i]  -  mean) /stdDev)  ,  3 )  ; 
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} 


return  (numDataElem/ ( (numDataElem-1 . 0)  *  (numDataElem-2 . 0) ) )  * 
sumXiMinusMeanDivByStdDevCubed ; 

}  //  end  skewness 


// _ ^ _ 

/** 

*  compute  the  kurtosis  of  the  data  in  the  Array 

*  ®  Author  MAJ  Tom  Cook 
*/ 

// _ 


public  double  kurtosis {) 

{ 

double  sumXiMinusMeanDivByStdDev4th  =  0.0; 

for(int  i  =  0;  i  <  numDataElem;  ++i) 

{ 

sumXiMinusMeanDivByStdDev4th  +=  Math. pow (( (data  [i]  -  mean) /stdDev)  ,  4)  ; 

} 


return  (  (numDataElem*  (numDataElem+1 . 0)  /  ( (numDataElem-1 . 0)  *  (numDataElem- 2 . 0) 
*  (numDataElem-3 . 0) ) 

*  sumXiMinusMeanDivByStdDev4th)  -  ((3.0  *  (Math. pow (numDataElem- 

1.0,2)))/ 

( (numDataElem- 2 . 0) * (numDataElem- 3 . 0) ) )  )  ; 

}  //  end  kurtosis 


// _ 

/** 

*  compute  the  confidence95  of  the  data  in  the  Array 

*  @  Author  MAJ  Tom  Cook 
*/ 

// _ 


public  double  conf idenceSB () 

{ 

double  tableValue  =  1.96; 

return  tableValue  *  (stdDev/Math. sqrt (numDataElem-1 . 0) )  ; 
}  //  end  confidence95 

// _ 

/  *  * 

*  Print  the  data  object 

*  @  Author  MAJ  Tom  Cook 
*/ 

// _ 


public  void  PrintDataOb j ( ) 

{ 


System. out  .println ( )  ; 


for(int  i  =  0;  i  <  numDataElem;  ++i) 
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System. out .println (data [i] +  ”,  ”)  ; 


System,  out  .print In  ("\n  The  Data  object  has  "+numDataElem+"  elements  in 
it's  array"); 

//  System. out .println  ("\n  and  was  created  at  "+timeObjBuilt+ " . ” ) ; 
//System. out .println  {"\n  and  was  created  at  "+rightNow+" . " ) ; 

System,  out  .println  {"  _  "); 

System,  out.  print  In  ("\n  DESCRIPTIVE  STATISTICS  ")/ 

System. out  .println  ("  _  "); 

System,  out  .println  ("  Minimum  =  "+min+"  ”); 

System. out  .println  ("  Maximum  =  ''+max+"  ")  ; 

System,  out  .println  ("  Range  =  "+range+"  "); 

System,  out  .println  ("  Sum  =  "+sum+"  "); 

System. out  .println  {"  Mean  =  "+mean+"  "); 

System. out .println  {"  median  =  "+median+"  "); 

//  System. out  .println  ("  mode  =  "+mode+"  "); 

System,  out  .println  ("  Variance  =  ''+variance+"  "); 

System,  out  .println  {"  Std  Dev  =  "+stdDev+"  "); 

System,  out  .println  ("  Skewness  =  "+skewness+"  ")  ; 

System,  out  .println  ("  Kurtosis  =  "+kurtosis+"  "); 

System. out  .println  {"  95%  Cl  =  "+conf idence95+"  NOTE;  conducted  with 
sample  StdDev  NOT  population  StdDev  "); 

}  //  end  PrintDataObj 


}  //  end  Class  DataObject 


MyFilelOCIass 

/** 

*  myFileloClass 

*  ®  Author  MAJ  Tom  Cook 

*  ®  Version  JDK  1.2 

*  This  class  is  for  all  file  10  done  in  this  program  except  for  the 
initialization  of 

*  the  data  vector. 

*/ 

//_____ _ 

import  java.io.*; 
import  j  ava . util . * ; 

//  _ 

/**  Class  to  open  and  retreive  data  from  a  file  */ 

public  class  myFileloClass 

{ 
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// 


/**  Method:  readWeight 

*  Return  Value:  double  weight 

*  Parameter:  na 

*  Purpose:  This  method  reads  a  weight  from  a  file 
*/ 


public  static  double  readWeight {)  { 

Vector  alpha  =  new  Vector  ( ) ; 
double  weight  =  0.0; 
double  vectorSize  =  0.0; 
int  counter  =  0; 

boolean  EOF  =  false;  //  boolean  to  determine  the  end  of  a  file 


try 

{ 

//  Create  a  file  object  and  an  input  stream  object  for  the  file 
String  directory  =  ”c : /Data/dirichlet/ " ;  //  Directory  path 

String  fileName  =  "weight . txt" ;  //  File  name 

File  myData  =  new  File (directory,  fileName); 

Buf feredReader  testin  =  new  Buf feredReader (new  FileReader (myData)); 


for  (String  nextLine  =  testin. readLine  ()  ;  nextLine  !==  null; 
nextLine  =  testin . readLine () ) 

{ 

Double  nextValue  =  Double.valueOf(nextLine); 
alpha .addElement (nextValue) ; 

} 

testin. close  0 ;  //  Close  the  input  stream 

} 

catch  (FileNotFoundExcept ion  e)  //  Stream  creation  exception 

{ 

System. err .print In (e) ; 

System. exit  (1) ;  *  //  End  the  program 


catch (lOExcept ion  e) 

{ 

System, err .print In ( " 

System. exit (1) ; 

} 

vectorSize  =  alpha . size () ; 

for  (counter  =  0;  counter  <  vectorSize;  counter++) 

{ 

weight  =  ( ( (Double) alpha, get (counter) ) . double Value ( ) ) ; 

} 

//System. out .print In (weight) ; 


/ /  File  read  exception 

Error  reading  input  file"  +  e  ) ; 
//  End  the  program 
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return  weight; 

}  //  End  for  mean 


// 


/**  Method:  readCdf Value 

*  Return  Value:  double  cdf Value 

*  Parameter:  na 

*  Purpose:  This  method  reads  a  cdf  Value  from  a  file 
*/ 


public  static  double  readCdfValue ( )  { 

Vector  cdfValue  =  new  Vector  () ; 
double  cdfVal  =0.0; 
double  vectorSize  =0.0; 
int  counter  =  0; 

boolean  EOF  =  false;  //  boolean  to  detezmiine  the  end  of  a  file 


{ 

//  Create  a  file  object  and  an  input  stream  object  for  the  file 
String  directory  =  "c : /Data/dirichlet/ " ;  //  Directory  path 

String  fileName  =  ’'cdfValue.txt";  //  File  name 
File  myData  =  new  File (directory,  fileName); 

Buf f eredReader  testin  =  new  Buff eredReader (new  FileReader (myData)); 


for  (String  nextLine  =  testin. readLine () ;  nextLine  1=  null; 
nextLine  =  testin. readLine 0 ) 

{ 

Double  nextValue  =  Double.valueOf(nextLine); 
cdfValue .addElement (nextValue) ; 

} 

testin.close 0 ;  //  Close  the  input  stream 

} 

catch (FileNotFoundExcept ion  e)  //  Stream  creation  exception 

{ 

System. err. println (e) ; 

System. exit (1) ;  //  End  the  program 


catch (lOExcept ion  e)  //  File  read  exception 

{ 

System. err .println ( "  Error  reading  input  file”  +e); 
System. exit (1) ;  //  End  the  program 

} 

vectorSize  =  cdfValue . size () ; 

for  (counter  =  0;  counter  <  vectorSize;  counter++) 

{ 

cdfVal  =  ( ( (Double )  cdfValue .  get  ( counter)  )  .  doubleValue  ( )  )  ; 
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} 

//System,  out  .println(cdfVal)  ; 


return  cdfVal ; 

}  //  End  for  cdfValue 


/A 


/★*  Method:  readStdDevGuess 

*  Return  Value:  double  stdDevVal 

*  Parameter:  na 

*  Purpose:  This  method  reads  a  Standard  Deviation  estimation  from  a  file 
*/ 


public  static  double  readStdDevGuess ( )  { 

Vector  stdDev  =  new  Vector  ( ) ;  ^ 
double  StdDevVal  =  0.0; 
double  vectorSize  =  0.0; 
int  counter  =  0 ; 

boolean  EOF  =  false;  //  boolean  to  determine  the  end  of  a  file 


try 

{ 

//  Create  a  file  object  and  an  input  stream  object  for  the  file 
String  directory  =  "c : /Data/dirichlet/ " ;  //  Directory  path 

String  fileName  =  "stdDevGuess.txt";  //  File  name 
File  myData  =  new  File (directory,  fileName); 

BufferedReader  testin  =  new  Buf feredReader (new  FileReader (myData)); 


for  (String  nextLine  =  testin. readLine () ;  nextLine  !=  null; 
nextLine  =  testin. readLine () ) 

{ 

Double  nextValue  =  Double.valueOf(nextLine); 

StdDev.  addElement  (next Value)  ; 

} 


testin. close ( )  ; 


ca t  ch ( F i 1 eNo t  FoundExc  ep t i on 

{ 

System. err .println (e) ; 
System. exit (1) ; 


catch (lOExcept ion  e) 

{ 

System. err .println ( "  Error 
System. exit (1) ; 

} 


//  Close  the  input  stream 
e)  //  Stream  creation  exception 
//  End  the  program 

//  File  read  exception 

reading  input  file"  +  e  ) ; 

//  End  the  program 


vectorSize  =  stdDev. size () ; 
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for  (counter  =  0;  counter  <  vectorSize;  counter++) 

{ 

stdDevVal  =  { { (Double) stdDev.get (counter) ) .doubleValue  0 ) ; 

} 

//System. out .println(stdDevVal) ; 


return  stdDevVal; 

}  //  End  for  readStdDevGuess 0 


// 


/**  Method:  readMeanGuess 

*  Return  Value:  double  meanGuess 

*  Parameter :  na 

*  Purpose:  This  method  reads  a  mean  estimation  from  a  file 
*/ 

public  static  double  readMeanGuess  ()  { 

Vector  mean  =  new  Vector  () ; 
double  meanVal  =  0.0; 
double  vectorSize  =  0.0; 
int  counter  =  0; 

boolean  EOF  =  false;  //  boolean  to  determine  the  end  of  a  file 


try 

{ 

//  Create  a  file  object  and  an  input  stream  object  for  the  file 
String  directory  =  "c : /Data/dirichlet/ " ;  //  Directory  path 

String  fileName  =  "meanGuess.txt”;  //  File  name 
File  myData  =  new  File (directory,  fileName); 

Buf f eredReader  testin  =  new  Buf feredReader (new  FileReader (myData) ) ; 


for  (String  nextLine  =  testin. readLine () ;  nextLine  !=  null; 
nextLine  =  testin. readLine () ) 

{ 

Double  nextValue  =  Double .valueOf (nextLine) ; 
mean. addElement (next Value) ; 

} 


testin, close ( ) ; 


catch (FileNotFoundExcept ion  e) 
{ 

System. err. println (e) ; 
System, exit (1) ; 


catch (lOExcept ion  e) 

{ 

System. err .println ( "  Error 
System. exit (1) ; 


//' Close  the  input  stream 
//  Stream  creation  exception 
//  End  the  program 

//  File  read  exception 

reading  input  file"  +  e  )  ; 

//  End  the  program 
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} 

vectorSize  =  mean. size (); 


for  {counter  =  0;  counter  <  vectorSize;  counter++) 

{ 

meanVal  =  ( ( (Double) mean. get (counter) ) . doubleValue  () )  ; 

} 

//System. out .println(stdDevVal) / 


return  meanVal; 

}  //  End  for  readStdDevGuess ( ) 


}  //  End  of  myFileloClass 


MyStatsProg  ram 


*  Main 

*  ®  Author  MAJ  Tom  Cook 

*  @  Version  JDK  1.2 

*  This  program  creates  a  StatTest  object.  The  StatTest  Object  consists  of  three 
Statistical  tests, 

*  the  Kolmogorov- Smirnov  Test,  the  Anderson-Darling  Test,  and  the  Dirichlet 
process.  Each  test  takes  a 

*  Data  Object  as  input  and  returns  the  results  of  the  individual  test. 

*/ 


// 


import  java.io.*; 
import  java. util.*; 

// _ 


public  class  MyStatsProgram 

{ 


public  static  stdNormalTable  STDNORMALTABLE  =  new  stdNormalTable  ( ) ; 
//  MAIN  PROGRAM 

// _ 


public  static  void  main (String  args[])  throws  lOException 

{ 


DataObject  myDataObject  =  new  DataObject  ( )  ; 
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//StatTest  .ksTestNormal  (myDataObject)  ; 
//StatTest .ksTestExpo (myDataObject)  ; 
//StatTest .adNormal (myDataObject) ; 
//StatTest .adExpo  (myDataObject) ; 
StatTest .dirichletTest  (myDataObject) ; 


} 

}//  end  MyStatsProgram 


StatTest 

I** 

*  StatTest 

*  ®  Author  MAJ  Tom  Cook 

*  ®  Version  JDK  1.2 

*  This  program  creates  a  StatTest  object.  The  StatTest  Object  consists  of  three 
Statistical  tests, 

*  the  Kolmogorov- Smirnov  Test,  the  Anderson-Darling  Test,  and  the  Dirichlet 
process.  Each  test  takes  a 

*  Data  Object  as  input  and  returns  the  results  of  the  individual  test. 

*/ 


// 


import  j  ava . io . * ; 
import  j ava. util.*; 

// _ 


public  class  StatTest 

{ 

// _ 


//, 


/**  Method:  ksTestNormal 

*  Return  Value:  double  normal 

*  Parameter:  DataObject  data 

*  Purpose:  This  test  allows  one  to  test  if  a  given  sample  of  n  observations  is 
from  a 

*  Normal  distribution.  It  is  based  on  the  observation  that  the  difference 

*  between  the  OBSERVED  Cumulative  Distribution  Function  (CDF)  and  the  EXPECTED 
CDF  should 

*  be  small.  The  K-S  statistic  Dn  is  the  largest  vertical  distance  between 
Fn(x)  and  F^ (x) 

*  for  all  values  of  x  and  is  defined  as  Dn  =  supremum{  |  Fn  (x)  -  F^  (x)  |  }  . 

*  Dn  is  calculated  as  follows  Dn+  =  max  for  (l<=i<=n)  {  (i/n)  -  F^(X(i))}, 

*  Dn-  =  max  for  (l<=i<=n) {F^ (X (i)  -  ((i-l)/n)} 

*  Dn  =  max  {Dn+,  Dn-} 
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*/ 


public  static  boolean  ksTestNormal (DataObject  data) 

{ 


double 

X 

= 

0.0 

double 

z 

= 

0.0 

double 

xMinusMean 

= 

0.0 

double 

tableValue 

0.0 

double 

value 

0.0 

double 

maxPos 

= 

0.0 

double 

maxNeg 

= 

0.0, 

double 

DnPos 

= 

-9999999.0; 

double 

DnNeg 

= 

-9999999.0; 

double 

Dn 

= 

-9999999.0; 

double 

tempDn 

= 

0.0; 

double 

const4 

= 

0.01; 

double 

constB 

0.85; 

int  zrow 

= 

0; 

int  zcolumn 

s= 

0; 

double  normal 
boolean  result; 

= 

0.0; 

/**  For  Hypothesized  Distribution  N(pop  mean,  pop  Var)  both  parameters 
unknown  so  we  estimate  them  using  Xbar(n)  and  Ssqrd(n)  HO  is  rejected  if  the 
adjusted  statistic  (sqrt  n  -  0.01  +{0.85/sqrt  n) )  Dn  is  >  C  1-alpha,  where  C  1 
-  alpha  is  a  table  look  up.  */ 

for  (int  counter  =  0;  counter  <  data  .numDataElem;  counter  ++) 

{ 

//  get  data  element  from  array,  subtract  the  mean,  and  divide  the 
result  by  the 

//  standard  deviation 
//  System. out .println (data .data [counter]  ) ; 

z  =  ( (data . data [counter]  -  data .mean) /data . stdDev)  ; 

//  System. out .println  ("  z  "  +  z) ; 
zrow  =  (int) (z  *  10) ; 
zcolumn  =  (int)((z  *  100)%  10); 

if  (z  <  -3.49) 

{ 

tableValue  =  .0002; 

} 

else  if  (z  >  3.49) 

{ 

tableValue  =  .9998; 

} 

else  if  (z  <  0  &  z  >-  -3.49) 

{ 

value  = 

stdNormalTable .  lookUpValue  (Math. abs  (zrow)  , Math. abs  (zcolumn) )  ; 
tableValue  =  1.0  -  value; 

} 

else 

{ 

tableValue  = 

StdNormalTable  .  lookUpValue  (Math .  abs  ( zrow)  ,  Math .  abs  ( zcolumn) )  ; 

} 
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//  System.out .print In  ("  tableValue  "  +  tableValue) ; 

maxPos  =  (  (double)  (counter  +  1) /data.numDataElem)  --  tableValue; 

//  System. out  .println  ("  maxPos  "  +  maxPos)  ; 
maxNeg  -  tableValue  -  ({(counter  +  1)  -  1)  /  (double)  data  .numDataElem)  ; 
//  System,  out  .println  ("  maxNeg  +  maxNeg)  ; 

tempDn  =  Ma th. max ( maxPo s , maxNeg) ; 

Dn  =  Math. max (Dn,  tempDn)  ; 


} 

//System. out .println  ("  tempDn  =  "  +  tempDn  +  "\n") ; 
//System.out  .println  ("  Dn  =  "  +  Dn  +  '’\n"); 

normal  =  (Math. sqrt (data i numDataElem)  -  const4  + 
const5/Math, sqrt  (data.numDataElem) )  *  Dn; 

//System. out .println  ("  normal  =  "  +  normal  +  "\n”) ; 

result  =  CriticalValuesKsNormal(normal); 

System.out .println (result) ; 


return  result; 

}  //  end  ksTestNormal 


// 


/**  Method:  CriticalValuesKsNormal 

*  Return  Value:  void 

*  Parameter: 

*  Purpose:  This  method  compares  the  passed  value  against  the  modified  critical 

*  values  in  the  table.  The  alpha  values  are:  .15,  .10,  .05,  .025, 
and  .01 

*  1  -  alpha  then  equals  .85,  .90,  .95,  .975, 

and  . 9  9 


public  static  boolean  CriticalValuesKsNormal (double  adjTestStat) 

{ 

double  critValDotEightFiveZero  =  0.775; 
double  critValDotNineZeroZero  =  0.819; 
double  critValDotNineFiveZero  =  0.895; 
double  critValDotNineSevenFive  =  0.955; 
double  critValDotNineNineZero  =  1.035; 


if  (  adjTestStat  >=  critValDotNineNineZero  ) 

{ 

System.out .println (  "\n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  + 
is  greater  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "\n")  ; 
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System. out  .println(  "  Therefore  reject  the  null  hypothesis  that  the  data  is 
from  an  normal  distribution  using  a  99%  Cl  \n"); 

return  false; 

} 

else  if  (  adjTestStat  <=  critValDotNineNineZero  &  adjTestStat  > 
critValDotNineSevenFive  ) 

{ 

System. out  .print In  (  "  \n  The  adjusted  K-S  test  statistic  ”  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "  but 
greater  than  the  modified  critical  value  "  +  critValDotNineSevenFive  +  "\n"); 

System,  out.  print  In  {  ”  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  normal  distribution  at  using  a  99%  Cl  \n”); 

return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineSevenFive  &  adjTestStat  > 
critValDotNineFiveZero) 

{ 

System. out  .print In  (  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  CritValDotNineSevenFive  +  ”  but 

greater  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "\n'’); 

System. out  .print In (  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  normal  distribution  using  a  97.5%  Cl  \n"); 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineFiveZero  &  adjTestStat  > 
critValDotNineZeroZero) 

{ 

System. out  .println(  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  CritValDotNineFiveZero  +  "  but 

greater  than  the  modified  critical  value  '*  +  critValDotNineZeroZero  +  "\n"); 

System. out  .print In  (  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  normal  distribution  using  a  95%  Cl  \n") ; 

return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineZeroZero  &  adjTestStat  > 
critValDotEightFiveZero) 

{ 

System. out  .print In  (  "  \n  The  adjusted  K-S  test  statistic  ”  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  ”  but 
greater  than  the  modified  critical  value  "  +  critValDotEightFiveZero  +  "\n"); 

System,  out.  print  In  (  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  normal  distribution  using  a  90%  Cl  \n") ; 
return  true; 

} 

else  if  (  adjTestStat  <  critValDotEightFiveZero) 

System, out  .print In  (  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  + 

"  is  less  than  the  modified  critical  value  "  +  critValDotEightFiveZero  +  "\n"); 

System. out  .println(  ”  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  normal  distribution  using  a  85%  Cl  \n")  ; 
return  true; 

}  //  End  for  CriticalValuesKsNormal 


// _ 

/**  Method:  ksTestExpo 


87 


*  Return  Value:  double  result 

*  Parameter: 

*  Purpose:  This  test  allows  one  to  test  if  a  given  sample  of  n  observations  is 
from  a 

*  Exponential  distribution.  It  is  based  on  the  observation  that  the 
difference 

*  between  the  OBSERVED  Cumulative  Distribution  Function  (CDF)  and  tyhe 
EXPECTED  CDF  should 

*  be  small.  The  K-S  statistic  Dn  is  the  largest  vertical  distance  between 
Fn{x)  and  F'^  (x) 

*  for  all  values  of  x  and  is  defined  as  Dn  =  supremum{  |  Fn  (x)  -  F^  (x)  |  } . 

*  Dn  is  calculated  as  follows  Dn+  =  max  for  (l<=i<=n)  {  (i/n)  -  F'^(X(i))}, 

*  Dn-  =  max  for  {l<=i<=n) {F^ (X (i)  -  ((i-l)/n)} 

*  Dn  =  max  {Dn+,  Dn- } 

V 


public  static  boolean  ksTestExpo(DataObject  data)  { 


double 

oneMinusEtoExp 

= 

0.0; 

double 

tableValue 

= 

0.0; 

double 

value 

= 

0.0; 

double 

maxPos 

0.0; 

double 

maxNeg 

0.0; 

double 

DnPos 

-9999999.0; 

double 

DnNeg 

= 

-9999999.0; 

double 

Dn 

= 

-9999999.0; 

double 

tempDn 

0.0; 

double 

DnExpo 

= 

0.0; 

double 

const 1 

= 

0.2; 

double 

const2 

= 

0.26; 

double 

const3 

0.5; 

double 

expo 

= 

0.0; 

boolean  result; 

//System. out .print In ( ”  I’m  in  ksTest  \n") ; 

/**  For  Hypothesized  Distribution  expo (Beta)  with  Beta  unknown 

*  Beta  is  estimated  by  its  MLE  Xbar(n),  and  F^  is  defined  to  be 
expo (Xbar (n) )  dist  Func 

*  HO  is  rejected  if  ( (Dn  -  (0 . 2/n) ) ) ( (sqrt  n  +  0.26  +  (0.5/  sqrt  n) )  > 
C’ ’  1-alpha 

*  where  C '  1  -  alpha  is  a  table  look  up. 

*/ 


//  DnExpo  gets  the  value  of  the  KS  statistic  calculated  from  the  data 

for  (int  counter  =  0;  counter  <  data .numDataElem;  counter  ++) 

{ 

oneMinusEtoExp  =  (1.0  -  Math. pow (Math. E, (- 
(data. data [counter] ) /data. mean) ) ) ; 

//System, out .println  ("1  -  e**  -x/mean  =  "  +  oneMinusEtoExp); 

maxPos  =  ( (double) (counter  +  1) /data .numDataElem)  -  oneMinusEtoExp; 
//System. out .println  ("maxPos  =  "  +  maxPos) ; 

maxNeg  =  (oneMinusEtoExp  -  (double) ( (counter  +1)  - 
1 ) /data . numDataElem) ; 
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//System. out .print In  ( "maxNeg  =  ”  +  maxNeg) ; 

tempDn  =  Math.max(maxPos,maxNeg) ; 

Dn  =  Math. max (Dn,  tempDn) ; 

//System. out .println  ("Dn  =  "  +  Dn) ; 

} 

expo  =  (Dn  -  (constl/data.numDataElem)  )  *  (Math,  sqrt  (data  .numDataElem)  + 
const2  +  (const 3 /Math. sqrt  (data. numDataElem) ) )  ; 

//System,  out  .printing  the  adjusted  statistic  =  "  +  expo  +  "  \n"); 

//  CriticalValuesExpo  (expo)  looks  up  the  result  of  the  K-S  statistic  to 
determine 

//  whether  to  reject  or  fail  to  reject  the  null  hypothesis  at 
//  alpah  =  ,15,  .1,  .05,  .025,  and  .01 

result  =  CriticalValuesKsExpo (expo) ; 

return  result; 

}  //  end  ksTestExpo 


// 
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j  'k'k 

★ 

* 

Method:  CriticalValuesKsExpo 

Return  Value :  void 

Parameter: 

* 

Purpose 

:  This  method  compares  the  passed 

value  against  the  modified 

critical 

* 

and 

.01 

values  in  the  table.  The  alpha 

values  are:  .15,  .10, 

.05, 

.025, 

★ 

and 

.99 

1  -  alpha  then  equals 

.85,  .90, 

.95, 

.975, 

★ 


*/ 

public  static  boolean  CriticalValuesKsExpo (double  adjTestStat) 

{ 

double  critValDotEightFiveZero  =  0.926; 
double  critValDotNineZeroZero  =  0.990; 
double  critValDotNineFiveZero  =  1.094; 
double  critValDotNineSevenFive  =  1.190; 
double  critValDotNineNineZero  =  1.308; 

if  (  adjTestStat  >=  critValDotNineNineZero  ) 

{ 

System.out,println(  "\n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  greater  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "\n")  ; 

System. out .println (  "  Therefore  reject  the  null  hypothesis  that  the  data  is 
from  an  exponential  distribution  using  a  99%  Cl  \n"); 
return  false; 

} 

else  if  (  adjTestStat  <=  critValDotNineNineZero  &  adjTestStat  > 
CritValDotNineSevenFive  ) 

{ 

System. out  .println  (  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  CritValDotNineNineZero  +  "  but 
greater  than  the  modified  critical  value  "  +  critValDotNineSevenFive  +  "\n"); 


89 


System. out .println {  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  at  using  a  99%  Cl  \n'‘); 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineSevenFive  &  adjTestStat  > 
critValDotNineFiveZero) 

{ 

System.out.println(  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineSevenFive  +  "  but 

greater  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  ’'\n'’); 

System. out  .println (  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  using  a  97.5%  Cl  \n"); 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineFiveZero  &  adjTestStat  > 
critValDotNineZeroZero) 

{ 

Syst  em.  out.  print  In  {  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "  but 

greater  than  the  modified  critical  value  ”  +  critValDotNineZeroZero  +  "\n"); 

System. out .println (  ”  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  using  a  95%  Cl  \n"); 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineZeroZero  &  adjTestStat  > 
cri tValDotEightFiveZero ) 

{ 

System. out  .println  {  "  \n  The  adjusted  K-S  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "  but 
greater  than  the  modified  critical  value  ”  +  critValDotEightFiveZero  +  "\n"); 

System. out .println (  ”  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  using  a  90%  Cl  \n")  ; 
return  true; 

} 

else  if  (  adjTestStat  <  critValDotEightFiveZero) 

System. out  .println  (  ”  \n  The  adjusted  K-S  test  statistic  ”  +  adjTestStat  + 

"  is  less  than  the  modified  critical  value  "  +  critValDotEightFiveZero  +  '’\n"); 

System. out .println (  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  using  a  85%  Cl  \n"); 
return  true; 

}  //  End  for  CriticalValuesKsExpo 

//  _ 


// 


/**  Method:  adNormalTest 

*  Return  Value : 

*  Parameter: 

*  Purpose : 

*/ 


public  static  boolean  adNormal (DataObject  data) 

{ 
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double  ADNormalResult  =  0.0 

double  ADNStatistic  =  0.0 

double  Adjustment  =  0.0 

double  s  =0.0 

double  d  =0.0 

double  q  =0.0 

double  xMinusExpectedMean2 ; 
double  Isum  =  O.O 

double  sum  =  0.0 

double  ADStat  =  0.0 

double  value  =  0.0 

double  value2  =  0.0 

double  tableValue  =  0.0 

double  tableValue2  =  0.0 

double  tableResult  =  0.0 

double  tableResult2  =  0.0 

double  2  =0.0 

double  zero  =  0.0 

int  ten  =  10; 

int  hundred  =  100; 

int  f  =  0; 

int  p  =  0; 

int  zrow  =  0; 

int  zcolumn  =  0; 

int  grow  =  0; 

int  qcolumn  =  0; 

boolean  result; 


for  (int  counter  =  1;  counter  <=  data .numDataElem;  counter++) 

{ 

f  =  (2*counter  -  1) ; 

2  =  ( (data. data [counter  -I]  -  data .mean) /data . stdDev) ; 
//System. out .println  ("  z  ”  +  z) ; 
zrow  =  (int) (z  *  10) ; 
zcolumn  =  (int) ( (z  *  100)%  10); 

if  (z  <  -3.49) 

{ 

tableValue  =  .0002; 

} 

else  if  (z  >  3.49) 

{ 

tableValue  =  .9998; 

} 

else  if  (z  <  zero  &  z  >=  -3.49) 

{ 

value  =  stdNormal Table .  lookUpValue  (Math .  abs  ( zrow)  , 

Math. abs (zcolumn) ) ; 


tableValue  =  1  -  value; 

} 

else 

{ 

tableValue  =  stdNormal Table . lookUpValue (Math . abs ( zrow) , 
Math. abs (zcolumn) ) ; 
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} 


tableResult  =  tableValue; 

//System. out .println  {"  tableResult  =  "  +  tableResult  +  "\n”) ; 


p  =  ( {data .numDataElem+1)  -  counter  -  1)  ; 
q  =  ( data,  data  [p]  -  data  .mean) /data .  stdDev; 

//System. out  .println  {”  The  q  -  "  +  q  +  "\n"); 
qrow  =  (int) (q  *  ten) ; 

//System,  out  .println  ("  qrow  =  "  +  qrow  +  "\n")  ; 
qcolumn  =  (int)  ( (q  *  hundred)%  ten)  ; 

//System. out  .println  ("  qcolumn  =  "  +  qcolumn  +  "\n")  ; 
if  (q  <  -3.49) 

{ 

tableValue2  -  .0002; 

} 

else  if  (q  >  3.49) 

{ 

tableValue2  =  .9998; 

} 

else  if  (q  <  zero  &  q  >=  -3.49) 

{ 

value  =  stdNormalTable. lookUpValue  (Math. abs  (qrow)  , 

Math.abs  (qcolumn)  )  ; 

tableValue2  =  1  -  value; 

} 

else 

{ 

tableValue2  =  stdNormalTable . lookUpValue  (Math.abs  (qrow)  , 

Math.abs  (qcolumn)  )  ; 

} 


tableResult2  =  tableValue2; 

//System,  out  .println  (”  tableResult2  =  "  +  tableResult2  +  "\n")  ; 

Isum  =  f  *  (Math,  log  (tableResult)  +  Math.logd.O  -  tableResult2 )  )  ; 
sum  =  sum  +  Isum; 

//System. out .println  ("  sum  =  "  +  sum  +  "\n"); 

//System. out .println  {"  f  =  ”  +  f  +  ”\n"); 


} 

//  compute  the  final  statistic  (-(sum)/n)  -n 

ADStat  =  ({-  sum) /data. numDataElem)  -  data.numDataElem; 

//System,  out  .println  (”  sum  =  ”  +  sum  +  ”\n’'); 

//System. out .println  {"  ADStat  =  "  +  ADStat  +  "\n"); 


//  Calculates  the  adjusted  Anderson-Darling  statistic  for  a  normal 
distribution  and 

//  stores  the  value  in  ADNormalResult 
Adjustment  =  1.0  +  (4  .  O/data. numDataElem)  - 
(25.0/ (data . numDataElem* data. numDataElem)  )  ; 

//System,  out  .println  ("  Adjustment  =  ”  +  Adjustment  +  "\n")  ; 
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ADNormalResult  =  ADStat  *  Adjustment; 

//System. out .println  (”  ADNormalResult  =  "  +  ADNormalResult  +  "\n"); 

//  CriticalValuesAndersonNormal  (ADNormalResult)  looks  up  and  determines 
whether  to 

//  reject  or  fail  to  reject  the  null  hypothesis  at  alpah  =  .1,  .05,  .025, 
and  . 0 1 

result  =  CriticalValuesAndersonNormal  (ADNormalResult); 


return  result; 

}  //  end  ADNormalTest 


// 


/**  Method:  CriticalValuesAndersonNormal 

*  Return  Value :  void 

*  Parameter: 

*  Purpose:  This  method  compares  the  passed  value  against  the  modified  critical 

*  values  in  the  table.  The  alpha  values  are:  .632,  .751,  .870, 
and  1.029 

*  1  -  alpha  then  equals  .90,  .95,  .975, 

and  .99 

* 

*/ 

public  static  boolean  CriticalValuesAndersonNormal (double  adjTestStat) 

{ 

double  critValDotNineZeroZero  =  .632; 
double  critValDotNineFiveZero  =  .751; 
double  critValDotNineSevenFive  =  .870; 
double  critValDotNineNineZero  =  1.029; 

if  (  adjTestStat  >  critValDotNineNineZero  ) 

{ 

System. out  .println  (  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  greater  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "\n"); 

System. out .println {  "  Therefore  reject  the  null  hypothesis  that  the  data  is 
from  an  Normal  distribution  using  a  99%  Cl")  ; 

return  false; 

} 

else  if  (  adjTestStat  <=  critValDotNineNineZero  &  adjTestStat  > 
CritValDotNineSevenFive  ) 

{ 

System. out  .println  (  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "  but 
greater  than  the  modified  critical  value  "  +  CritValDotNineSevenFive  +  "\n"); 

System. out  .println  (  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  Normal  distribution  using  a  99%  Cl"); 
return  true; 

} 

else  if  (  adjTestStat  <=  CritValDotNineSevenFive  &  adjTestStat  > 
CritValDotNineFiveZero) 

{ 
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System. out . print In {  ”  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineSevenFive  +  "  but 

greater  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "\n”); 

System. out .print In (  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  Normal  distribution  using  a  97.5%  Cl,  \n"); 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineFiveZero  &  adjTestStat  > 
critValDotNineZeroZero) 

{ 

System.out .println(  "  \n  The  adjusted  A-D  test  statistic  ”  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "  but 

greater  than  the  modified  critical  value  "  +  critValDotNineZeroZero  +"\n"); 

System.out .print In (  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  Normal  distribution  using  a  95%  Cl  \n’')  ; 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineZeroZero) 

System.out .print In (  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "\n"); 

System.out .print In (  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  Normal  distribution  using  a  90%  Cl  \n")  ; 
return  true; 


}  //  End  for  CriticalValuesAndersonNormal 

// _ 


// 


/**  Method:  AndersonDarlingExpo 

*  Return  Value:  double  ADStat 

*  Parameter:  Vector  vector 

*  Purpose:  Returns  the  Anderson-Darling  statistic  for  an  exponential 
distribution 

*  based  on  the  following  formula; 

*  ADStat  =  (-{sum  from  i  =  1  to  n  (2i-l) [InZi  +  In (1  -  Zn+l-i) ] } /n) -n 
*/ 


public  static  boolean  adExpo (DataObject  data) 


{ 

double  oneMinuseRaisedToexp  =  0.0 
double  oneMinuseRaisedToexpl  =0.0 
double  oneMinuseRaisedToexpZ  =  0.0 

double  one  =  1.0 

double  I sum  =  0.0 

double  sum  =  0.0 

double  ADStat  =  0.0 

double  AdjustedADStat  =  0.0 

int  z  =  0; 

int  f  =  0; 

boolean  result; 


//  loop  through  array  to  compute  (sum  from  i  =  1  to  n  (2i-l)  [InZi  +  ln(l  - 
Zn+l-i)]} 

for  (int  counter  =  1;  counter  <=  data .numDataElem;  count er++) 

{ 
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f  =  (2*counter  ~  1) / 


if  (data. data [counter  -  1]  <=  0) 

{ 

oneMinuseRaisedToexp  =  0; 

} 

else 

{ 

oneMinuseRaisedToexp  =  one  -  (  Math.pow (Math.E,  -  (data .data [counter 
1] ) /data .mean) ) ; 

//System. out .print In  (oneMinuseRaisedToexp) ; 


2  =  ( (data . numDataElem+1)  -  counter  -1) ; 

//System. out .println  (z) / 
if  (data. data [z]  <=  0) 

{ 

oneMinuseRaisedToexp2  =  0; 

} 

else 

{ 

oneMinuseRaisedToexp2  =  one  -  (one  ~  (Math -pow (Math. E,  - 
(data. data  [z] ) /data. mean) ) ) ; 

//System. out .println  (oneMinuseRaisedToexp2 ) ; 

} 

if  (oneMinuseRaisedToexp  <=  0  &  oneMinuseRaisedToexp2  <=  0) 

{ 

I sum  =  0,0; 

} 

else  if  (oneMinuseRaisedToexp  <=  0) 

{ 

Isum  =  f* (Math. log (oneMinuseRaisedToexp2) ) ; 

} 

else  if  (oneMinuseRaisedToexp2  <=  0) 

{ 

Isum  =  f * (Math.log (oneMinuseRaisedToexp) ) ; 

} 

else 

{ 

Isum  =  f  *  (Math. log (oneMinuseRaisedToexp)  + 

Math . log (oneMinuseRaisedToexp2) ) ; 

} 


//System. out .println  ("  Isum  =  "  +  isum  +  "\n"); 
sum  =  sum  +  Isum; 

//System. out .println  ("  sum  =  "  +  sum  +  ”\n"); 
//System. out .println  (”  f  =  "  +  f  +  "\n”) ; 

} 


//  compute  the  final  statistic  (“(sum)/n)  -n 

ADStat  =  ((-  sum) /data  .numDataElem)  -  data .numDataElem; 

//System. out .println  (”  ADStat  =  "  +  ADStat  +  "\n”); 

AdjustedADStat  =  (1.0  +(0.6/data.numDataElem))*ADStat; 

//System,  out  .println  ("  AdjustedADStat  =  '»  +  AdjustedADStat  +  "\n"); 
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result  =  CriticalValuesAndersonExpo  (AdjustedADStat)  ; 


return  result; 

}  //  End  for  AndersonDarlingExpo 


/A 


/**  Method:  CriticalValuesAndersonExpo 

*  Return  Value:  boolean 

*  Parameter : 

*  Purpose:  This  method  compares  the  passed  value  against  the  modified  critical 

*  values  in  the  table.  The  alpha  values  are:  1.070,  1.326,  1.587, 
and  1.943 

*  1  -  alpha  then  equals  .90,  .95,  .975, 

and  . 9  9 

* 

*/ 

public  static  boolean  CriticalValuesAndersonExpo (double  adjTestStat) 

{ 

double  critValDotNineZeroZero  =  1.070; 
double  critValDotNineFiveZero  =  1.326; 
double  critValDotNineSevenFive  =  1.587; 
double  critValDotNineNineZero  =  1.943; 

if  (  adjTestStat  >  critValDotNineNineZero  ) 

{ 

System. out -println (  ”  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  greater  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "\n”)  ; 

System. out  .println  (  "  Therefore  reject  the  null  hypothesis  that  the  data  is 
from  an  exponential  distribution  using  a  99%  Cl"); 
return  false; 

} 


else  if  {  adjTestStat  <=  CritValDotNineNineZero  &  adjTestStat  > 
CritValDotNineSevenFive ) 

{ 

System. out  .println  (  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineNineZero  +  "but 

greater  than  the  modified  critical  value  "  +  CritValDotNineSevenFive  +  "\n"); 

System. out  .println {  "  Therefore  fail  to  reject  the  null  hypothesis;  that  the 
data  is  from  an  exponential  distribution  using  a  99%  CI\n"); 
return  true; 

} 

else  if  {  adjTestStat  <=  critValDotNineSevenFive  &  adjTestStat  > 
CritValDotNineFiveZero  ) 

{ 

System. out  .println  (  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineSevenFive  +  "  but 

greater  than  the  modified  critical  value  "  +  critValDotNineFiveZero  +  "\n"); 

System. out  .println  {  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  exponential  distribution  a  97.5%  Cl  \n")  ; 
return  true; 

} 

else  if  (  adjTestStat  <=  critValDotNineFiveZero  &  adjTestStat  > 
CritValDotNineZeroZero) 

{ 
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System. out .print In (  "  \n  The  adjusted  A-D  test  statistic  ”  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  ”  +  critValDotNineFiveZero  +  "  but 
greater  than  the  modified  critical  value  "  +  critValDotNineZeroZero  +"\n"); 

System. out .print In (  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  exponential  distribution  using  a  95%  Cl  \n") / 
return  true; 

} 


else  if  (  adjTestStat  <=  critValDotNineZeroZero) 

System. out. println(  "  \n  The  adjusted  A-D  test  statistic  "  +  adjTestStat  +  " 
is  less  than  the  modified  critical  value  "  +  critValDotNineZeroZero  +  "\n") ; 

System. out. println(  "  Therefore  fail  to  reject  the  null  hypothesis,  that  the 
data  is  from  an  exponential  distribution  using  a  90%  Cl  \n'’)  ; 
return  true; 

}  //  End  for  CriticalValuesAndersonExpo 

// _ 


// 


/**  Method:  dirichletTest 

*  Return  Value :  Double  answer 

*  Parameter:  Vector  vector 

*  Purpose:  This  method 
*/ 

public  static  double  dirichletTest (DataObject  data) 
{ 


double 

sum 

0.0; 

double 

vectorSize 

= 

0.0; 

double 

mean 

= 

0.0; 

double 

stdDev 

= 

0.0; 

double 

cdfValue 

= 

0.0; 

// 

need 

to  read  this 

double 

weight 

= 

0.0; 

// 

read 

from 

file 

double 

lookUpValue 

= 

0.0; 

// 

read 

from 

a  table 

double 

empiricalCdf 

= 

0.0; 

double 

answer 

0.0; 

double 

Fobserved 

0.0; 

double 

stdDevGuess 

= 

0.0; 

// 

read 

from 

a  file 

double 

meanGuess 

0.0; 

// 

read 

from 

a  file 

double 

tableValue 

= 

0.0; 

double 

zero 

= 

0.0; 

double 

value 

= 

0.0; 

int  Fobservedrow 

= 

0; 

int  Fobservedcolumn 

= 

0; 

//  System.out .println{ "I 'm  in  dirichletTest  \n"); 

//  read  weight  (alpha)  from  file 
weight  =  myFileloClass.readWeightO; 

System.out .print In ( "weight  =  "  +  weight); 

//  read  cdf  value  from  file 

cdfValue  =  myFileloClass.readCdfValueO; 

System.out .println ( "cdf Value  =  "  +  cdfValue); 

//  read  weight  (stdDevGuess)  from  file 
stdDevGuess  =  myFileloClass.readStdDevGuessO; 
System.out .println ( "stdDevGuess  =  "  +  stdDevGuess); 
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//  read  weight  (meanGuess)  from  file 
meanGuess  =  myFileloClass.readMeanGuessO; 

System,  out  .print  In  ( "meanGuess  =  "  +  meanGuess); 

//  Calculate  the  empirical  CDF  from  your  data 
empiricalCdf  =  empiricalCdf (data) ; 

System,  out  .println{"  The  empirical  CDF  for  "  +  cdfValue  +  "  =  "  + 
empiricalCdf  +  "\n") ; 

//  compute  F  observed  (x)  =  P{  z  <=  (  (x-mean) /stdDev)  ) 

Fobserved  =  ((cdfValue  -  meanGuess) /stdDevGuess)  ;  //  must  look  up  value  for 
Fobserved  from  table 

System,  out  .print In { "  F  observed  =  "  +  Fobserved  +  "\n"); 

Fobservedrow  =  (int) (Fobserved  *  10); 

Fobservedcolumn  =  (int )( (Fobserved  *  100)%  10); 

if  (Fobserved  <  -3.49) 

{ 

tableValue  =  .0002; 

} 

else  if  (Fobserved  >  3.49) 

{ 

tableValue  =  .9998; 

} 

else  if  (Fobserved  <  zero  &  Fobserved  >=  -3.49) 

{ 

value  =  stdNormal Table .  lookUpValue  (Math. abs  (Fobservedrow)  , 
Math.abs  (Fobservedcolumn) )  ; 

tableValue  =  1  -  value; 

} 

else 

{ 

tableValue  =  stdNormal  Table .  lookUpValue  (Math .  abs  ( Fobservedrow)  , 
Math.abs (Fobservedcolumn) ) ; 

} 


lookUpValue  =  tableValue; 

System. out .println  ("  lookUpValue  for  F  observed  =  "  +  lookUpValue  + 

"\n")  ; 

//  This  is  only  good  for 

//  N(mean  =  x,  stdDev  =  y)  must  also  check  for  Exponential... 


//  compute  result  of  test 

answer  -  ( (weight/ (weight  +  data.numDataElem) ) *  lookUpValue)  + 
( (data.numDataElem/  (weight  +  data.numDataElem) )  *  empiricalCdf)  ; 

System. out . println ( "  Dirichlet  Result  =  "  +  answer  +  "\n"); 


return  answer; 

}  //  End  for  dirichletTest 
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// 


// _ 

/**  Method:  empiricalCdf 

*  Return  Value:  double  result 

*  Parameter:  Vector 

*  Purpose :  This  method 
*/ 


public  static  double  empiricalCdf (DataObject  data)  { 


int  counter  =  0; 

double  value  =  0.0; 

double  NumElementsLessThanEqx  =  0.0; 

double  cdfValue  =  3.0;  //get  this  from  a  file 

double  empiricalCdf  =0.0; 

//  compute  the  empirical  CDF  i.e  if  data  =  2,3^4,5/10,20,16  then  (x)  (the 
empirical  CDF) 

//  where  x  =6,  {6)  =  4/7  search  vector  and  count  all  values  <=  your  x. 


//  loop  through  the  vector  and  count  all  the  data  that  are  less  than  the 
cdfValue 

for  (counter  =  0;  counter  <  data.numDataElem;  counter++) 

{ 

value  =  data. data [counter] ; 
if (  value  <=  cdfValue  ) 

{ 

NumElementsLessThanEqx++ ; 

} 

} 

//  empiricalCdf  is  defined  by  the  number  of  data  less  than  or  equal  to 
your  cdfValue 

//  divided  by  the  number  of  data  in  the  vector. 
empiricalCdf  =  (NumElementsLessThanEqx  /  data.numDataElem)  ; 


return  empiricalCdf; 

}  //  End  for  empiricalCdf 


// _ 

}//  end  StatTest 


StdNormalTable 


/** 

*  StdNormalTable 

*  ®  Author  MAJ  Tom  Cook 

*  ®  Version  JDK  1.2 
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*  This  class  constructs  a  standard  normal  distribution  table.  The  class 
provides  methods 

*  to  display  the  table  and  to  look  up  values  in  the  table  when  given  the  row 
and  column 

V 


//_ 


import  j  ava . io . * ; 
import  j ava. util.*; 

j  -k-k 

*  stdNormal Table 

*  ®  Author  MAJ  Tom  Cook 

*  @  Version  JDK  1.2 

*  This  class  constructs  a  standard  normal  distribution  table 
*/ 


//. 


public  class  stdNormalTable 

{ 

private  static  double  []  []  cdfOfz; 

public  StdNormalTable ( ) 

{ 

int  rows  =35; 
int  columns  =  10; 
double  value  =  0.0; 
boolean  EOF  =  false; 

cdfOfz  =  new  double  [rows] [columns] ; 

//  populate  the  array  with  values  from  c:  data\StandardNormal 
try 

{ 

//  Create  a  file  object  and  an  input  stream  object  for  the  file 
String  directory  =  "c : /Data/Tables/'' ;  //Directory  path 

String  fileName  =  "StandardNormalL.txt";  //File  name 
File  myData  =  new  File (directory ,  fileName); 

Buffer edReader  testin  =  new  Buff eredReader (new  FileReader (myData) ) ; 
String  nextLine  =  testin. readLineO; 


for  (int  i  =  0;  i  <  rows;  i++) 

{ 

for  (int  j  =  0;  j  <  columns;  j++) 

{ 


} 


} 


Double  nextValue  =  Double.valueOf(nextLine); 

cdfOfz[i][j]  =  nextValue .doubleValue 0  ; 

//value  =  ((Double)  cdfOfz  [i]  [j  ]).  doubleValue  ()  ; 
nextLine  =  testin.  readLineO; 


testin. close ( ) ; 


//  Close  the  input  stream 
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} 


catch (FileNotFoundExcept ion  e)  //  Stream  creation  exception 

{ 

System. err .println (e) ; 

System. exit  (1)  ;  //  End  the  program 

} 

catch (lOExcept ion  e)  //  File  read  exception 

{ 

System. err. println{”  Error  reading  input  file”  +  e  ); 
System. exit (1) ;  //  End  the  program 

} 

//  displayTable (cdfOf z)  ; 

}  //  end  table 


// 


/**  Method:  displayTable 

*  Return  Value : 

*  Parameter: 

*  Purpose:  This  method  displays  the  standard  Normal  Table 
*/ 


public  static  void  displayTable  (double  []  []  array) 

{ 

int  rows  =  35; 
int  columns  =  10; 


System. out .println 0 ; 

for  (int  i  =  0;  i  <  rows;  i++) 

{ 

for  (int  j  =  0;  j  <  columns;  j++) 

{ 

System. out .print (array [i]  [j 3  +  "  ”); 

} 

System . out . println ( ) ; 

} 

System, out .println 0 ; 

}  //  End  for  displayTable 

// _ 


/**  Method:  lookUpValue 

*  Return  Value:  double  result 

*  Parameter:  Vector 

*  Purpose:  This  method 
*/ 


public  static  double  lookUpValue (int  row,  int  column) 

{ 


return  cdfOf z [row] [column] ; 
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}  //  End  for  lookUpValue 

}  //  end  StdNormal Table 


102 


APPENDIX  B:  MSHN  WRAPPER  OUTPUT:  BISON  APPLICATION 


Read  Local  File  Between  Call  Interval  Data 


0.0002379420 

0.0008879900 

0.0001699920 

0.0021100000 

0.0020610100 

0.0020589800 

0.0021688900 

0.0021679400 

0.0020650600 

0.0020550500 

0.0020610100 

0.0021659100 

0.0020630400 

0.0021709200 

0.0020589800 

0.0020519500 

0.0020610100 

0.0023119400 

0.0020589800 

0.0021649600 

0.00212598 

0.00205505 

0.00205898 

0.00205505 

0.00206196 

0.00205195 

0.00206304 

0.00223601 

0.000611067 

0.000177979 


0.00657201 

0.00290203 

0.00281799 

0.00292504 

0.002823 

0.00279796 

0.002823 

0.00279891 

0.00293291 

0.00279796 

0.00281799 

0.00281501 

0.00293195 

0.00279403 

0.00282204 

0.00279903 

0.00293207 

0.00279796 

0.00281906 

0.00285602 

0.00282395 

0.00279701 

0.002823 

0.00280094 

0.00299001 

0.00285995 

0.00281703 

0.00290501 

0.00281799 

0.00279999 


0.010213 

0.00279593 

0.00293601 

0.00279701 

0.00281906 

0.00279593 

0.00281799 

0.00279605 

0.00282001 

0.00279593 

0.00298989 

0.00279391 

0.00281107 

0.00290596 

0.00281596 

0.00279605 

0.00281703 

0.00279403 

0.00293696 

0.00279808 

0.002823 

0.00279403 

0.00281692 

0.00279498 

0.00288188 

0.00279701 

0.00301504 

0.00280106 

0.00288308 

0.00279903 


0.00282001 

0.00279701 

0.00293398 

0.00285399 

0.023878 

0.00279701 

0.00292802 

0.0027951 

0.00281894 

0.00279796 

0.002823 

0.00279605 

0.00298607 

0.00279593 

0.00282097 

0.00279295 

0.00282502 

0.002859 

0.00282001 

0.00279593 

0.00294006 

0.00279701 

0.00297093 

0.00279403 

0.00282001 

0.00279403 

0.00282001 

0.00290489 

0.00293601 

0.00279701 


0.00282204 

0.00279605 

0.00281906 

0.00278103 
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Read  Local  File  Between  Call  Interval 
Descriptive  Statistics 

The  Data  object  has  124  elements  in  its  array 

DESCRIPTIVE  STATISTICS 

Minimum  =1.69992E-4 
Maximum  =  0.023878 
Range  =  0.023708008 
Sum  =  0.3536384799999999 
Mean  =0.002851923225806451 
median  =  0.00279897 
Variance  =  4.50934949772738E-6 
Std  Dev  =  0.0021235228978580332 
Skewness  =  8.336133910535583 
Kurtosis  =  80.37299019256903 

95%  Cl  =  3.75284358641 83767E-4  conducted  with  sample  StdDev  NOT  population  StdDev 


Read  Local  File  System  Call  Interval  Data 


4.2804700 

4.3248200 

5.0572000 

4.2871400 

4.2779500 

4.2841100 

4.3633400 

4.8517200 

4.2793300 

4.2727300 

4.2767800 

4.3813400 

4.2897700 

4.2685100 

4.2752200 

4.2883300 

4.8514900 

4.2781600 

4.2905900 

4.3051400 

4.4577200 

4.3358400 

4.2650800 

4.3221000 

4.2826300 

4.2886700 

4.2838700 

4.2846100 

4.2765400 

4.2821100 

4.2656600 

4.2981500 

4.2876900 

4.2636600 

4.2878800 

4.3124700 

4.3591400 

4.3370800 

4.2700800 

4.2706600 

4.2689400 

4.2719700 

4.3264600 

4.2770500 

5.2796300 

4.2914000 

4.2860700 

4.3732800 

5.8598100 

4.2747400 

4.2713500 

4.2744500 

4.9625300 

4.2895200 

4.4054700 

4.3359100 

5.1148200 

4.2917600 

4.2724700 

4.2859500 

5.5357400 

4.2637300 

4.2857800 

4.2825800 

4.2941700 

4.3723800 

4.2873200 

4.2754500 

4.3399600 

4.3196500 

4.2764100 

4.2761000 

4.2847000 

4.2878600 

4.3370800 

4.2735200 

4.2887200 

4.2811500 

4.2635900 

4.2808300 

4.4864200 

4.2884400 

4.2868200 

4.2807100 

4.3925000 

4.2715200 

4.2872100 

4.3273500 

4.2788300 

4.2858900 

4.3940100 

4.2799300 

4.2915100 

4.2824600 

4.2889800 

4.2797300 

4.2805600 

4.3515000 

4.2904000 

4.2771400 

4.2733500 

4.3490200 

4.2801300 

4.3448400 

4.3404800 

4.2730500 

4.2690900 

4.2845000 

4.2778900 

4.2885100 

4.4165500 

4.2836200 

4.2937900 

4.6632300 

4.2707600 

4.2767000 

4.2794100 

5.1722800 

4.2835200 

4.3276600 

4.2838300 

5.6836200 

4.2850900 

4.2789000 
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Read  Local  File  System  Call  Interval 
Descriptive  Statistics 

The  Data  object  has  124  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =  4.26359 
Maximum  =  5.85981 
Range  =  1 .5962200000000006 
Sum  =  542.91 781 00000001 

Mean  =  4.378369435483872 
median  =  4.286445 
Variance  =  0.07669490164276818 
StdDev  =0.27693844377906107 
Skewness  =  3.6271420741918203 
Kurtosis  =  13.337311135732914 

95%  Cl  =  0.0489425691438161 1  conducted  with  sample  StdDev  NOT  population  StdDev 
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Write  Local  File  Between  Call  Interval  Data 


0.0156031. 

0.0073600 

0.0084820 

0.0057191 

0.0088260 

0.0058780 

0.0082260 

0.0055970 

0.0079370 

0.0048281 

0.0071050 

0.0063371 

0.0079560 

0.0047760 

0.0066360 

0.0032510 

0.0077579 

0.0036781 

0.0080320 

0.0028909 

0.0090940 

0.0033259 

0.0098050 

0.0027220 

0.0080670 

0.0002149 

0.0016660 

0.0090051 

0.0058440 

0.0056130 


0.0047240 

0.0047650 

0.0047050 

0.0047389 

0.0053140 

0.0060949 

0.0049460 

0.0051309 

0.0054439 

0.0044271 

0.0456740 

0.0830749 

0.0780929 

3.0012600 

0.0065221 

0.0065230 

0.0064651 

0.0057050 

0.0056670 

0.0056560 

0.0060140 

0.0051100 

0.0048591 

0.0049671 

0.0047621 

0.0048649 

0.0047571 

0.00481 1 1 

0.0047209 

0.0048590 


0.0048000 

0.0050629 

0.0048571 

0.0048400 

0.0050490 

0.0048760 

0.0048801 

0.0048140 

0.0049889 

0.0051869 

0.0052869 

0.0053390 

0.0049470 

0.0050730 

0.0049139 

0.0049650 

0.0047780 

0.0049460 

0.0047720 

0.0048940 

0.0048590 

0.0049939 

0.0048341 

0.0060340 

0.0048890 

0.0046971 

0.0047809 

0.0046860 

0.0047491 

0.0046630 


0.0017691 

0.0034690 

0.0017691 

0.0025539 

0.0009520 

0.0010110 

0.0015650 

0.0009500 

0.0009520 

0.0009490 

0.0009480 

0.0009490 

0.0009520 

0.0009481 

0.0009500 

0.0009509 

0.0009511 

0.0009490 

0.0009470 

0.0009550 

0.0009500 

0.0009511 

0.0009480 

0.0009490 

0.0009569 

0.0009520 

0.0009481 

0.000951 1 

0.0009490 

0.0005450 


0.0038041 
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Write  Local  File  Between  Call  Interval 
Descriptive  Statistics 

The  Data  object  has  121  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =  2.14934E-4 
Maximum  =  3.00126 
Range  =3.0010450659999997 
Sum  =  3.723687764 
Mean  =  0.030774279041 32231 5 
median  =  0.00484002 
Variance  =  0.07425892790543169 
StdDev  =0.27250491354364914 
Skewness  =  1 0.974487295973677 
Kurtosis  =  120.6173104054261 

95%  Cl  =  0.04875731 5471 75822  conducted  with  sample  StdDev  NOT  population  StdDev 
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Write  Local  File  System  Call  Interval  Data 


0.0005690 

0.0005760 

0.0006280 

0.0005569 

0.0006289 

0.0005690 

0.0006280 

0.0005630 

0.0006330 

0.0005519 

0.0006210 

0.0005690 

0.0006419 

0.0005510 

0.0006059 

0.0005391 

0.0006311 

0.0006330 

0.0006050 

0.0005460 

0.0006270 

0.0005521 

0.0006360 

0.0005389 

0.0006450 

0.0004971 

0.0004960 

0.0017310 

0.0005690 

0.0004770 


0.0005190 

0.0004480 

0.0005220 

0.0004520 

0.0005701 

0.0004400 

0.0005771 

0.0004561 

0.0005640 

0.0004290 

0.0006330 

0.0005311 

0.000631 1 

0.0005779 

0.0005629 

0.0004500 

0.0005640 

0.0004380 

0.0005680 

0.0004380 

0.0005651 

0.0004629 

0.0005499 

0.0004519 

0.0005450 

0.0004480 

0.0005460 

0.0004860 

0.0005420 

0.0004450 


0.0005051 

0.0004460 

0.0005920 

0.0004300 

0.0005740 

0.0004330 

0.0005579 

0.0004281 

0.0005341 

0.0004281 

0.0005630 

0.0004910 

0.0005320 

0.0004431 

0.0005350 

0.0004461 

0.0005410 

0.0004420 

0.0005360 

0.0004491 

0.0005420 

0.0004660 

0.0005410 

0.0004810 

0.0005829 

0.0004320 

0.0005530 

0.0004300 

0.0005370 

0.0004690 


0.0006729 

0.0004431 

0.0005330 

0.0004431 

0.0004820 

0.0005420 

0.0004890 

0.0005630 

0.0004901 

0.0006260 

0.0004860 

0.0005389 

0.0004910 

0.0005610 

0.0004921 

0.0005660 

0.0004940 

0.0005660 

0.0004849 

0.0005270 

0.0004880 

0.0005530 

0.0004920 

0.0005610 

0.0004860 

0.0005580 

0.0004870 

0.0005670 

0.0004820 

0.0005630 


0.0005070 
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Write  Local  File  System  Call  Interval 
Descriptive  Statistics 

The  Data  object  has  121  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =  4.28081  E-4 
Maximum  =0.00173104 
Range  =  0.001302959 
Sum  =  0.06520473700000001 
Mean  =  5.38882 1 239669423E-4 

median  =  5.38945E-4 
Variance  =  1.5787897999059505E-8 
StdDev  =  1.256499025031 8344E-4 
Skewness  =  7.188071883660956 
Kurtosis  =  68.05663344645232 

95%  Cl  =  2.2481 620076777346E-5  conducted  with  sample  StdDev  NOT  population  StdDev 
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Write  Remote  File  Between  Call  Interval  Data 


0.0001320 

0.0001349 

0.0001349 

0.0001330 

0.0001360 

0.0001321 

0.0001309 

0.0001370 

0.0001351 

0.0001320 

0.0001320 

0.0001320 

0.0001340 

0.0001310 

0.0001321 

0.0001329 

0.0001340 

0.0001321 

0.0001320 

0.0001301 

0.0001330 

0.0001320 

0.0001329 

0.0001400 

0.0001340 

0.0001321 

0.0001320 

0.0001310 

0.0001370 

0.0001330 

0.0001310 

0.0001340 

0.0001321 

0.0001321 

0.0001340 

0.0001330 

0.0001340 

0.0001330 

0.0001961 

0.0001329 

0.0001310 

0.0001329 

0.0001329 

0.0001409 

0.0001330 

0.0001330 

0.0001340 

Write 

Remote 

File  Between  Call 

Interval 

Descriptive  Statistics 

The  Data  object  has  47  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =1.30057E-4 
Maximum  =  1 .96099E-4 
Range  =  6.604200000000002E-5 
Sum  =  0.006326914999999999 
Mean  =  1.3461 521276595742E-4 
median  =1.32918E-4 
Variance  =  8.848993495374656E-1 1 
Std  Dev  =  9.406908894729796E-6 
Skewness  =  6.334653681690928 
Kurtosis  =  42.01 1280023941055 

95%  Cl  =  2.71 84671 756673306E-6  conducted  with  sample  StdDev  NOT  population  StdDev 
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Write  Remote  Fiie 

System 

Call  Interval  Data 

0.0007430 

0.0007371 

0.0007130 

0.0007449 

0.0023741 

0.0007360 

0.0031611 

0.0007250 

0.0025040 

0.0007470 

0.0007380 

0.0007499 

0.0007401 

0.0007390 

0.0007399 

0.0023891 

0.0007451 

0.0023190 

0.0007430 

0.0032979 

0.0007380 

0.0007510 

0.0007430 

0.0007560 

0.0007440 

0.0007130 

0.0024420 

0.0007451 

0.0031530 

0.0007470 

0.0007490 

0.0007370 

0.0007410 

0.0007380 

0.0007499 

0.0025049 

0.0007360 

0.0032489 

0.0007509 

0.0007340 

0.0007460 

0.0007380 

0.0007421 

0.0007441 

0.0028380 

0.0007449 

0.0023710 

Write 

Remote  File  System  Call 

Interval 

Descriptive  Statistics 


The  Data  object  has  47  elements  in  its  array 


DESCRIPTIVE  STATISTICS 


Minimum  =7.12991E-4 
Maximum  =  0.00329792 
Range  =  0.002584929 
Sum  =0.058531985999999994 
Mean  =  0.0012453614042553 19 
median  =  7.44939E-4 
Variance  =  7.954 153236570723E-7 
Std  Dev  =  8.918605965379748E-4 
Skewness  =  1.328469203612622 
Kurtosis  =  0.028706766384108295 

95%  Cl  =  2.57735435 10322367E-4  conducted  with  sample  StdDev  NOT  population  StdDev 
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APPENDIX  C:  EXPERIMENT  RESULTS 


Read  Local  File  Between  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  5.017385802574527  is  greater  than  the  modified  critical  value 
1.035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  5.389212805739073  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 

TEST:  A-D  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  30.677626699552437  is  greater  than  the  modified  critical  value 

1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  imderlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  35.577424141814724  is  greater  than  the  modified  critical  value 
1.943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 


Read  Local  File  System  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  4.022949627920754  is  greater  than  the  modified  critical  value 
1.035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  7.101488327130617  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 
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TEST:  A”D  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  29.938894071092253  is  greater  than  the  modified  critical  value 

1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  53.527837656652814  is  greater  than  Ihe  modified  critical  value 
1.943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 


Write  Local  File  Between  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  5.38801483546206  is  greater  than  the  modified  critical  value 
1.035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  7.833717859887696  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 


TEST:  A-D  Test  for  normal  distribution 

NULL  HYPOTHESIS;  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  45.53619992778167  is  greater  than  the  modified  critical  value 
1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  104.13662957631682  is  greater  than  the  modified  critical  value 
1 .943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 
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Write  Local  File  System  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  2.3574479954921 1 1  is  greater  than  the  modified  critical  value 
1 .035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  6, 178306629193417  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 


TEST:  A-D  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  10. 160396396726927  is  greater  than  the  modified  critical  value 

1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  imderlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  42. 19919343201918  is  greater  than  the  modified  critical  value 
1 .943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 

Write  Remote  File  Between  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  K-S  test  statistic  2.48335644275209026  is  greater  than  the  modified  critical  value 

1.035.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  4.42238394581 1 191  is  greater  than  die  modified  critical  value 
1 .308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 


TEST:  A-D  Test  for  normal  distribution 
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NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 

RESULTS:  The  adjusted  A-D  test  statistic  1 1.454589644697467  is  greater  than  the  modified  critical  value 

1.029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  CL 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  20.68485876216289  is  greater  than  the  modified  critical  value 
1.943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 


Write  Remote  File  System  Call  Interval  Results 

TEST:  K-S  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  3.1369014322242457  is  greater  than  the  modified  critical  value 
1.035.  Therefore,  reject  die  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  K-S  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  K-S  test  statistic  3.1028894423071307  is  greater  than  the  modified  critical  value 
1.308.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  Cl. 


TEST:  A-D  Test  for  normal  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  normal  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  10.1 1692228867237  is  greater  than  the  modified  critical  value 
1 .029.  Therefore,  reject  the  null  hypothesis  that  the  data  is  from  a  normal  distribution  using  a  99%  Cl. 

TEST:  A-D  Test  for  exponential  distribution 

NULL  HYPOTHESIS:  Data  is  from  a  population  with  an  underlying  exponential  distribution 
RESULTS:  The  adjusted  A-D  test  statistic  8.704621245573474  is  greater  than  die  modified  critical  value 
1.943.  Therefore  reject  the  null  hypothesis  that  the  data  is  from  an  exponential  distribution  using  a  99%  CL 
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Appendix  D:  minitab  resuits 

Descriptive  Statistics 


J  I  i  l  I  i  \ 

O.GGO  0.004  0.C08  0.012  0.016  0.020  0.024 

\  .  I  .  \  I  I  1  ! 


95%  Confidence  Inten/al  for  Mu 


\  \  i  i  i 

0.GQ25  G.0027  0.0D2S  0.0031  C.C033 

i  1  I  I  I 


95%  Confidence  Interval  for  Median 


Variable:  rlfbci 


Anderson-Darling  Normality  Test 


A-Squared: 

29.469 

P-Value: 

0.000 

Mean 

2.85E-03 

StDev 

2.12E-03 

Variance 

4.51  E-06 

Skewness 

8.33614 

Kurlosis 

80.3731 

N 

124 

Minimum 

1.70E-04 

1  St  Quartile 

2.78E-03 

Median 

2.80E-03 

3rd  Quartile 

2.82E-03 

Maximum 

2,39E-02 

95%  Confidence 

Interval  for  Mu 

2.47E-03 

3.23E-03 

95%  Confidence  Interval  for  Sigma 
1.89E-03  2.43E-d3 


95%  Confidence  Interval  for  Median 
2.80E-03  2.82E-03 


Figure  21:  Minitab  Results  For  rlfbci  Data 
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Descriptive  Statistics 


Variable:  rifsci 


I  \  ^  ^  5  I 

4.7  5Xi  53  5.S  5.S 


85%  Confidence  Interva*  for  Median 


Anderson-Darling  Normality  Test 


A-Squared: 

29.406 

P-Value: 

0.000 

Mean 

4.37837 

StDev 

0.27694 

Variance 

7.67E-02 

Skewness 

3.62714 

Kurtosis 

13.3373 

N 

124 

Minimum 

4.26359 

1  St  Quartile 

4.27791 

Median 

4.28645 

3rd  Quartile 

4.33380 

Maximum 

5.85981 

95%  Confidence  interval  for  Mu 
4.32914  4.42760 

95%  Confidence  Interval  for  Sigma 
0.24623  0.31646 

95%  Confidence  interval  for  Median 
4.28375  4.28869 


Figure  22:  Minitab  Results  For  rifsci  Data 
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Descriptive  Statistics 


S5%  Conticience  Interval  for  Mu 


I  s  I 

0.00  0.C4  C,0S 

I  :  1 


95%  Confidence  interval  for  ^/ledian 


Variable:  wlfbci 


Anderson-Darling  Normality  Test 


A-Squared: 

43.971 

P-Value: 

0.000 

Mean 

0.030774 

StDev 

0.272505 

Variance 

7.43E-02 

Skewness 

10.9745 

Kurtosis 

120.617 

N 

121 

Minimum 

0.00021 

1  St  Quartile 

0.00216 

Median 

0.00486 

3rd  Quartile 

0.00569 

Maximum 

3.00126 

95%  Confidence  Interval  for  Mu 

-0.01827 

0.07982 

95%  Confidence  Interval  for  Sigma 
0.24196  0.31195 


95%  Confidence  Interval  for  Median 
0.00477  0.00495 


Figure  23:  Minitab  Results  For  rlfsci  Data 
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Descriptive  Statistics 


I  f  ■  "  I  ’  ■  i'"" '  I "  ^  i  r 


O.O0D4  0.000S  o.aocs  oma  0.0012  0.0014  0.001s 


95%  Oonrlcenoe  Intent!  for  Mu 


95%  Confidehce'fnterx^al  for  r».^8dian 


Variable;  wifsci 


Anderson-Darling  NormalityTest 


A-Squared: 

9.594 

P-Value: 

0.000 

Mean 

5.39E-04 

StDev 

1.26E-04 

Variance 

1.58E-08 

Skewness 

7.18826 

Kurtosis 

68.0590 

N 

121 

Minimum 

4.28E-04 

1  st  Quartile 

4.82E-04 

Median 

5.39E-04 

3rd  Quartile 

5.68E-04 

Maximum 

1.73E-03 

95%  Confidence  interval  for  Mu 

5.16E-04 

5.61  E-04 

95%  Confidence  Inten^al  for  Sigma 
1.12E-04  1.44E-04 


95%  Confidence  Interval  for  Median 
5.23E-04  5.52E-04 


Figure  24:  Minitab  Results  For  wifsci  Data 
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Descriptive  Statistics 


I  I  >  I  !  I  1 

0.00013  0.00C14  0.00015  0.0001S  0.00017  O.OOD18  O.OOD19 
I  I  .5  J  J  I  1 


S5%  Oonficience  Inten/al  torUis 


95%  Conficenco  inten^al  for 


Variable:  wrfbci 


Anderson-Darling  Normality  Test 


A-Squared: 

11.011 

P-Value: 

0.000 

Mean 

1.35E-04 

StDev 

9.41  E-06 

Variance 

8.85E-11 

Skewness 

6.33475 

Kurtosis 

42.0100 

N 

47 

Minimum 

1.30E-04 

IstQuartile 

1.32E-04 

Median 

1.33E-04 

3rd  Quartile 

1.34E-04 

Maximum 

1.96E-04 

95%  Confidence 

Interval  for  Mu 

1.32E-04 

1.37E-04 

95%  Confidence  Interval  for  Sigma 
7.82E-06  1.18E-05 


95%  Confidence  Interval  for  Median 
1.32E-04  1.33E-04 


Figure  25:  Minitab  Results  For  wrfbci  Data 


121 


Descriptive  Statistics 


I  I  I  I 

O.OOOSO  DJG0105  0.001^  0.00155 


o.oooso  DJG0105  o.oQi:^  G.00W 

I  ..  .....  )..  ..  . . . . ..J  .............  .  . ..I  .... 

i  '  ' 

S5%  Comldenci  Interval  for  Median 


Variable:  wrfsci 


Anderson-Darling  NormalityTest 


A-Squared: 

9.488 

P-Value: 

0.000 

Mean 

1.25E-03 

StDev 

8.92E-04 

Variance 

7.95E-07 

Skewness 

1.32846 

Kurtosis 

2.87E-02 

N 

47 

Minimum 

7.13E-04 

.IstQuartlle 

7.38E-04 

Median 

7.45E-04 

3rd  Quartlle 

2.32E-03 

Maximum 

3.30E-03 

95%  Confidence 

Interval  for  Mu 

9.84E-04 

1.51  E-03 

95%  Confidence  Inteival  for  Sigma 
7.41  E-04  1.12E-03 


95%  Confidence  Interval  for  Median 
7.42E-04  7.50E-04 


Figure  26:  Minitab  Results  For  wrfsci  Data 
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APPENDIX  E:  ACRONYMS 


cdf 

Cumulative  distribution  flmction 

CL 

Client  Library 

CPU 

Central  Processing  Unit 

DARPA 

Defense  Advanced  Research  Projects  Agency 

DoD 

Department  of  Defense 

ETC 

Expected  Time  for  Completion 

fd 

File  descriptor 

I/O 

Input  and/or  Output 

MSHN 

Management  System  for  Heterogeneous  Networks 

NWS 

Network  Weather  Service 

OLB 

Opportunistic  Load  Balancing 

OS 

Operating  System 

QoS 

Quality  of  Service 

RDT&E 

Research,  Development,  Testing,  and  Evaluation 

rlfbci 

read  local  file  between  call  interval 

rlfsci 

read  local  file  system  call  interval 

RMS 

Resource  Management  System 

RRD 

Resource  Requirements  Database 

RSS 

Resource  Status  Server 

SA 

Scheduling  Advisor 

wlfbci 

write  local  file  between  call  interval 

wlfsci 

write  local  file  system  call  interval 

wrfbci 

write  remote  file  between  call  interval 

wrfsci 

write  remote  file  system  call  interval 
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